NINJAL-LWP for TWC

Tsukuba Web Corpus (NINJAL-LWP for TWC) Website
https://tsukubawebcorpus.jp/

This is a large scale Japanese language corpus which consists of 1.1 billion words, constructed from the website. One can search the co-occurrence relation of words with the search tool （NINJAL-LWP for TWC）

Database

1138 million words gathered from Japanese language websites are used in TWC ver.1.30.

Functions

NINJAL-LWP for TWC uses the lexical profiling method and can exhaustively show the co-occurrence relation and grammatical behavior of content words such as nouns and verbs. In order to output information on collocation and grammatical behavior, NLT analyzes BCCWJ data with annotations.

Morphological analysis MeCab 0.98 + IPA Dictionary 2.7.0
Dependency Parsing CaboCha 0.60

*The IPA Dictionary for Morphological analysis does not include information on representative transcriptions, thus expands individually and corresponds to representative transcriptions.

How to Use

A descriptive manual of NINJAL-LWP for TWC is available here. Manual

Movie

PDF for printing

Click here to download a PDF file for printing.