NINJAL-LWP for TWC
This is a large scale Japanese language corpus which consists of 1.1 billion words, constructed from the website.
One can search the co-occurrence relation of words with the search tool （NINJAL-LWP for TWC）
1138 million words gathered from Japanese language websites are used in TWC ver.1.30.
NINJAL-LWP for TWC uses the lexical profiling method and can exhaustively show the co-occurrence relation and grammatical behavior of content words such as nouns and verbs.
In order to output information on collocation and grammatical behavior, NLT analyzes BCCWJ data with annotations.
*The IPA Dictionary for Morphological analysis does not include information on representative transcriptions, thus expands individually and corresponds to representative transcriptions.
- Morphological analysis MeCab 0.98 + IPA Dictionary 2.7.0
- Dependency Parsing CaboCha 0.60
How to Use
A descriptive manual of NINJAL-LWP for TWC is available here.