Skip to content

tomek/Japanese-Tools

 
 

Repository files navigation

These are some scripts that help me learn Japanese.

Most scripts are supposed to be used as plugins for an IRC bot or run on a shell. I find the following aliases to be useful:

alias ja="$JAPANESE_TOOLS/jmdict/jm.sh"
alias wa="$JAPANESE_TOOLS/jmdict/wa.sh"
alias rtk="$JAPANESE_TOOLS/rtk/rtk.sh"
alias sen="$JAPANESE_TOOLS/yahoo_jisho/daijisen.sh"
alias rin="$JAPANESE_TOOLS/yahoo_jisho/daijirin.sh"
alias gd="$JAPANESE_TOOLS/google_dictionary/gd.sh"

I do most of my dictionary lookups with these aliases.

audio/

find_audio.sh finds an audio version of a given Japanese word on languagepod101.

$ ./find_audio.sh 夜空
Audio for 夜空 [よぞら]: http://tinyurl.com/p8aq8jo

compare_encoding

Compares the size of different encodings of the same Japanese Wikipedia article. In almost all cases UTF-8 is smaller than UTF-16.

$ ./compare_encoding.sh 夜空
UTF-8 vs. UTF-16: 91213 vs. 156876 bytes. UTF-8 wins by 41.8%.

gettext/

Internationalization support. Currently supported languages other than English:

  • German
  • Polish

You should call gettext/regenerate_mo_files.sh if you’d like to use any of the translations or they won’t work.

google_count

Counts the number of Google results. Uses google.co.jp for queries containing Japanese characters and google.com otherwise.

google_dictionary/

gd.sh looks up English words in the Google dictionary.

$ ./gd.sh diligent
/ˈdiləjənt/ having or showing care and conscientiousness in one's work or duties

google_translate/

gt.sh translates words and sentences using Google Translate. The target language is determined by the environment variable LANG, but it can also be specified explicitly.

./gt.sh My hovercraft is full of eels.
私のホバークラフトは鰻がいっぱいです。

./gt.sh it My hovercraft is full of eels.
it: Il mio hovercraft è pieno di anguille.

./gt.sh Il mio hovercraft è pieno di anguille.
My hovercraft is full of eels.

Currently this script is broken because Google switched off the translate API.

jmdict/

jm.sh provides jmdict lookups and wa.sh wadoku lookups. Works best for Japanese->English (or Japanese->German), not so well for the reverse direction.

To start, you first need to run the scripts prepare_jmdict.sh and prepare_wadoku.sh. This will download and process the respective dictionary files.

$ ./jm.sh 村長
村長 [そんちょう] (n), village headman
市長村長選挙 [しちょうそんちょうせんきょ] (n), mayoral election

kana/

A simple hiragana and katakana trainer.

Example IRC session

<Christoph>  !hira help
<nihongobot> Start with "!hira <level> [count]". Known levels are 0
             to 10. To learn more about some level please use
             "!hira help <level>".
<nihongobot> To only see the differences between consecutive
             levels, please use "!hira helpdiff <level>".
<Christoph>  !hira 5
<nihongobot> Please write in romaji: す と に ね へ
<Christoph>  !hira su to ni ne he
<nihongobot> Perfect! 5 of 5. Statistics for Christoph: 44.64% of
             280 characters correct.
<nihongobot> Please write in romaji: は と ぬ ほ な

kanjidic/

Implements a lookup in kanjidic: http://www.csse.monash.edu.au/~jwb/kanjidic.html

$ ./kanjidic.sh 日本語
日: 4 strokes. ニチ, ジツ, ひ, -び, -か. In names: あ, あき, いる, く, くさ, こう, す, たち, に, にっ, につ, へ {day, sun, Japan, counter for days}
本: 5 strokes. ホン, もと. In names: まと {book, present, main, origin, true, real, counter for long cylindrical things}
語: 14 strokes. ゴ, かた.る, かた.らう {word, speech, language}

lhc

This script has nothing to do with Japanese. It OCRs the image on http://op-webtools.web.cern.ch/op-webtools/vistar/vistars.php?usr=LHC1 to provide live statistics of the status of the LHC.

reading/

read.py converts kanji to kana using mecab.

$ ./read.py 鬱蒼たる樹海の中に舞う人の如き影が在った。
鬱蒼[うっそう] たる 樹海[じゅかい] の 中[なか] に 舞[ま]う
人[ひと] の 如[ごと]き 影[かげ] が 在[あ]っ た 。

reading_quiz/

A quiz asking kanji -> kana questions. Only works as an IRC plugin for now.

Example IRC session

<Christoph>  !quiz jlpt2
<nihongobot> Please read: 発見
<Christoph>  !quiz はっけん
<nihongobot> Christoph: Correct! (はっけん:
             (n,vs) 1. discovery, 2. detection, 3. finding)

romaji/

romaji.sh converts kanji and kana to romaji using mecab.

$ ./romaji.sh 鬱蒼たる樹海の中に舞う人の如き影が在った。
ussou taru jukai no naka ni mau hito no gotoki kage ga atsu ta 。

rtk/

rtk.sh does a lookup between keyword - kanji - number. The keywords and numbers refer to Heisig’s amazing book “Remembering the Kanji”.

$ ./rtk.sh 城壁
#362: castle 城 | #1500: wall 壁

$ ./rtk.sh star
#1556: star 星, #237: stare 眺, #1476: starve 餓,
#2532: star-anise 樒, #2872: start 孟, #2376: mustard 芥

$ ./rtk.sh 1 2 3
#1: one 一 | #2: two 二 | #3: three 三

simple_bot/

As the name says, it’s a simple IRC bot. You can start it with:

$ ./bot.py <server[:port]> <channel> <nickname> [NickServ password]

It uses all the other scripts.

yahoo_jisho/

Binding to Yahoo!辞書, the Yahoo Japanese dictionary. It prints a small excerpt from the dictionary and a link to the full result.

$ ./daijisen.sh うれしい
うれし・い【×嬉しい】 ( http://tinyurl.com/32esm38 )
[形][文]うれ・し[シク] 1 物事が自分の望みどおりになって満足で
あり、喜ばしい。自分にとってよいことが起き、愉快で、楽しい。「努力が
報われてとても―・い」「―・いことに明日は晴れるらしい」⇔悲しい。  2
相手から受けた行...

$ ./daijirin.sh うれしい
うれし・い(3) 【▼嬉しい】 ( http://tinyurl.com/39bwl22 )
(形) [文]シク うれ・し 1 (望ましい事態が実現して)心がうきうきとし
て楽しい。心が晴れ晴れとして喜ばしい。  ⇔悲しい 2 満足して、相手に
感謝する気持ちになるさま。ありがたい。かたじけない。  〔派生〕 ...

About

Some tools for learners of the Japanese language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 73.5%
  • Python 20.9%
  • XSLT 5.6%