国立国語研究所
 
 

Lexical Statistics: Version 2020.03

This page presents Word Count Charts and Lexical Item Charts for the "Corpus of Historical Japanese" Version 2020.03.

  • Word Count Charts assemble word counts (in two types: word counts including punctuation marks, and word counts not including punctuation marks) according to Sample ID, Core/non-core data, Main text type, and Style.
  • The Lexical Item Charts assemble data on word counts for lemmas according to Historical period and literary work. For each work in each historical period, there is information on how many times each lemma is attested, and there is a filter such as those in Excel and other applications, allowing data to be sorted and refined by reference to such factors as part of speech and lexical type.

In the "Corpus of Historical Japanese" Version 2020.03 there are presented 18,910,000 Short Unit Words, and 2,570,000 Long Unit Words. The following sets out the breakdown for the individual sub-corpora.

  • Word Count of each sub-corpus
PeriodSub-corpusShort Unit WordLong Unit Word
Nara periodNara Period Series I: Man'yōshū99,00094,000
Nara Period Series II: Senmyō21,00017,000
Heian periodHeian Period Series1,030,000912,000
Heian period / Kamakura periodWaka-shū Series269,000252,000
Kamakura periodKamakura Period Series I: Folktales and Essays844,000792,000
Kamakura Period Series II: Diaries and Travel Literature128,000118,000
Muromachi PeriodMuromachi Period Series I: Kyōgen277,000256,000
Muromachi Period Series II: Christian Materials138,000128,000
Edo PeriodEdo Period Series I: Share-bon218,000
Edo Period Series II: Ninjo-bon406,000
Edo Period Series III: Chikamatsu-Joruri255,000
Meiji Era / Taishō Era / Shōwa EraMeiji Era / Taishō Era Series I: Magazines14,180,000
Meiji Era / Taishō Era Series II: Textbooks856,000
Meiji Era / Taishō Era Series III: Early Meiji Spoken Language Materials193,000

Word Count Chart for Short Unit Words

The word counts for the data collected in the "Corpus of Historical Japanese" are presented in the following files. Word frewuency (both including punctuation marks and not including punctuation marks) have been arranged according to Sample ID, Core/non-core, Main text type (including quotations), and Style.

Data on the Word Count Chart for Short Unit Words can be downloaded through the following links.

Download Word Count Chart for Short Unit Words tsv Data (Version 2020.03)

Download Word Count Chart for Short Unit Words Excel Data (Version 2020.03)

Word Count Chart for Long Unit Words

The word counts for the data collected in the "Corpus of Historical Japanese" are presented in the following files. Word counts (both including punctuation marks and not including punctuation marks) have been arranged according to Sample ID, Core/non-core, Main text type (including quotations), and Style.

Data on the Word Count Chart for Long Unit Words can be downloaded through the following links.

Download Word Count Chart for Long Unit Words tsv Data (Version 2020.03)

Download Word Count Chart for Long Unit Words Excel Data (Version 2020.03)

Lexical Item Charts for the "Corpus of Historical Japanese"

Word counts of individual lexemes (and word counts of lexical type and of part of speech) for the data collected in the "Corpus of Historical Japanese" have been arranged by historical period and by literary work.

They can be downloaded through the following links.

Short Unit Word Lexical Item Charts (Version 2020.03)

Long Unit Word Lexical Item Charts (Version 2020.03)

 
event
unidic_bnr

日本語をはじめとする言語を分析するための基礎資料として、書き言葉や話し言葉の資料を体系的に収集し、研究用の情報を付与したものです。