Overview

"The Corpus of Spontaneous Japanese" (or CSJ) is a database containing a large collection of Japanese spoken language data and information for use in linguistic research; jointly developed by NINJAL, NICT and the Tokyo Institute of Technology, the CSJ is world-class in both the quantity and quality of the available data.

The corpus has been used for a wide variety of research purposes such as spoken language processing, natural language processing, phonetics, psychology, sociology, Japanese education, and dictionary compilation.

"The Corpus of Spontaneous Japanese" is available to the public via two methods, both online and as a USB flash drive set. Requests to use the corpus for commercial purposes are considered on an individual basis, so if that is the case please contact us at the address below.

Inquiries: kotonoha [at] ninjal.ac.jp(please convert the 'at')

Please note, the paid edition contains only the pure corpus data, and does not contain any reference aids (such as dictionary tools).

How to Apply

Released Data(9th edition)

Outline of the CSJ-RDB

Sample Files

Documents

Miscellaneous information

Overview