NINJAL-LWP for TWC

Tsukuba Web Corpus (NINJAL-LWP for TWC) Website
https://tsukubawebcorpus.jp/

This is a large scale Japanese language corpus which consists of 1.1 billion words, constructed from the website. One can search the co-occurrence relation of words with the search tool (NINJAL-LWP for TWC)

Database

1138 million words gathered from Japanese language websites are used in TWC ver.1.30.

Functions

NINJAL-LWP for TWC uses the lexical profiling method and can exhaustively show the co-occurrence relation and grammatical behavior of content words such as nouns and verbs. In order to output information on collocation and grammatical behavior, NLT analyzes BCCWJ data with annotations.

  • Morphological analysis MeCab 0.98 + IPA Dictionary 2.7.0
  • Dependency Parsing CaboCha 0.60

*The IPA Dictionary for Morphological analysis does not include information on representative transcriptions, thus expands individually and corresponds to representative transcriptions.

How to Use

A descriptive manual of NINJAL-LWP for TWC is available here. Manual

Movie

PDF for printing