Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract::API.to_language_codes output is incorrect #41

Open
knowtheory opened this issue Jul 18, 2014 · 3 comments
Open

Tesseract::API.to_language_codes output is incorrect #41

knowtheory opened this issue Jul 18, 2014 · 3 comments

Comments

@knowtheory
Copy link

tesseract --list-langs 2>&1 | ruby -r 'tesseract' -e 'puts "ok?, code, api"; STDIN.read.split("\n").map{ |code| res = Tesseract::API.to_language_code(code); puts "#{code == res}, #{code}, #{res}" }'
ok?, code, api
true, List of available languages (69):, List of available languages (69):
true, afr, afr
true, ara, ara
true, aze, aze
true, bel, bel
true, ben, ben
true, bul, bul
true, cat, cat
false, ces, cze
true, chi_sim, chi_sim
true, chi_tra, chi_tra
true, chr, chr
true, dan-frak, dan-frak
true, dan, dan
true, deu-frak, deu-frak
false, deu, ger
false, ell, gre
true, eng, eng
true, enm, enm
true, epo, epo
true, epo_alt, epo_alt
true, equ, equ
true, est, est
false, eus, baq
true, fin, fin
false, fra, fre
true, frk, frk
true, frm, frm
true, glg, glg
true, grc, grc
true, heb, heb
true, hin, hin
true, hrv, hrv
true, hun, hun
true, ind, ind
false, isl, ice
true, ita, ita
true, ita_old, ita_old
true, jpn, jpn
true, kan, kan
true, kor, kor
true, lav, lav
true, lit, lit
true, mal, mal
false, mkd, mac
true, mlt, mlt
false, msa, may
false, nld, dut
true, nor, nor
true, osd, osd
true, pol, pol
true, por, por
false, ron, rum
true, rus, rus
true, slk-frak, slk-frak
false, slk, slo
true, slv, slv
true, spa, spa
true, spa_old, spa_old
false, sqi, alb
true, srp, srp
true, swa, swa
true, swe, swe
true, tam, tam
true, tel, tel
true, tgl, tgl
true, tha, tha
true, tur, tur
true, ukr, ukr
true, vie, vie

I don't think using the ISO_639 conversion is viable unfortunately :\ I suspect that an internal hash keeping track of codes is going to be necessary.

@shishirsharma
Copy link

What is the work around for this. ?

@shishirsharma
Copy link

I think you have to use alpha3_terminologic in cases where it is available

lang = 'cze' ; ISO_639.find(lang).alpha3_terminologic.empty? ? ISO_639.find(lang).alpha3 : ISO_639.find(lang).alpha3_terminologic

@shishirsharma
Copy link

Do you have any update on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants