Skip to content
/ hanyu Public

chinese language data and dictionary

Notifications You must be signed in to change notification settings

sph-mn/hanyu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chinese language data and dictionary

dictionary

dictionary that sorts results by word frequency and character count. it is a single file, html/hanyu-dictionary.html, and needs to be served via http for the javascript to run in the browser. also hosted here.

data files

see under data/

  • frequency-pinyin-translation.csv: words with pinyin and translation sorted by frequency
  • cedict.csv: filtered csv version of cedict with one translation per line
  • character-strokes-composition.csv: characters with stroke count and composition
  • table-of-general-standard-chinese-characters.csv: the official character list including pronunciations
  • characters-by-pinyin.csv
  • words-by-type/: separated by verb, noun, adjective, and so on
  • hsk.csv and hsk-pinyin-translations.csv
  • character-learning.csv: characters sorted by frequency, with readings and number of words with this reading, false pronunciations for guessing and syllable commonness among all characters, compositions, character meaning, and example words
  • character-overlap.csv: characters and characters with the most shared components
  • characters-repeated-components.csv: characters that consist of a repetition of another character
  • ... and more

data sources

license

creative commons share-alike

development

  • ./exe/update-dictionary to build html/hanyu-dictionary.html from html/hanyu-dictionary-template.html
  • the main code file is js/main.coffee

hanzi-convert

a command-line utility to convert text.

convert marks to numbers:

echo fāshāo shì yīn | ./exe/hanzi-convert --numbers
fa1shao1 shi4 yin1

convert traditional to simplified:

echo 發燒試音 | ./exe/hanzi-convert --simplify
发烧试音

convert from hanzi to pinyin:

echo 发烧试音 | ./exe/hanzi-convert --pinyin
fa1shao1 shi4 yin1

convert from pinyin to hanzi:

echo fa1shao1 shi4 yin1 | ./exe/hanzi-convert --hanzi
发烧 是/事/试/市/式/室/世/仕/侍/势/嗜/噬/士/奭/弑/忕/恃/戺/拭/揓/柿/栻/氏/澨/示/筮/舐/莳/螫/视/誓/谥/贳/轼/逝/适/释/铈/饰/𬤊 因/阴/喑/垔/堙/姻/愔/慇/殷/氤/洇/瘖/禋/筃/茵/裀/铟/音/骃/𬘡/𬮱

alternatives, like basically every output of this project, sorted by frequency.