Private fork of nhocr. Please use http://code.google.com/p/nhocr/.
License
asanoki/nhocr-0.21-a
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
---------------------------------------------------------------- NHocr - the Japanese OCR ---------------------------------------------------------------- 1. Introduction NHocr is a command line OCR (Optical Character Recognition) program for Japanese language. It has been designed to recognize machine-printed Japanese characters and some ASCII characters /symbols in an image. NHocr is probably the first Open Source Japanese OCR software, except some experimental, partial codes open to academic communities. "nhocr" command reads PBM/PGM/PPM image file(s), recognizes the text line image for each file, and produces text data in UTF-8. Each file should contain only ONE horizontal text line image in line recognition mode, or only ONE text block in block recognition mode, without any surrounding lines or dirt. You can also use NHocr through WeOCR service at: http://maggie.ocrgrid.org/nhocr/ The program is highly experimental, and the character recognition performance is limited. (You will be happier with a commercial product if you want a high performance OCR.) The character feature used in NHocr is based on Peripheral Local Moment (P-LM) proposed by Hori et al. in late 90's. NHocr is originally a product of the author's weekend programming. The development work may be rather slow. 2. Installation and configuration 1) O2-tools-2.00 (or newer) is required for building NHocr. The source package is available at: http://www.imglab.org/p/O2/ Download O2-tools-2.xx.tar.gz, build it, and install it. 2) Run configure script with --with-O2tools option in the top directory. Then, build and install the programs. $ ./configure --with-O2tools=<O2tools_directory_on_your_system> $ make (switch to root if necessary) # make install 3) If you want to use dictionary files in a non-standard directory, you need to specify the location by setting the environment variable NHOCR_DICDIR. For example, if the dictionary files are in /opt/nhocr/DIC , $ NHOCR_DICDIR=/opt/nhocr/DIC ; export NHOCR_DICDIR 4) If you want to change the combination of character sets, you can set the dictionary codes using the environment variable NHOCR_DICCODES. For example: $ NHOCR_DICCODES=ascii+:zh_CN ; export NHOCR_DICCODES The built-in default is ascii+:jpn for ASCII and Japanese characters. 3. Usage Running nhocr without any argument will show the usage. A typical usage is: $ nhocr -line -o output.txt input.pgm 4. Using NHocr with OCRopus NHocr can be used as a line recognizer together with OCRopus, a document analysis and OCR system. NHocr-OCRopus bridge is included in the package. See the Lua scripts in ocropus/ directory. 5. License See LICENSE file. For details: http://code.google.com/p/nhocr/ http://sourceforge.jp/projects/nhocr/ -- Dec. 31, 2009 Hideaki Goto, Tohoku University, Japan
About
Private fork of nhocr. Please use http://code.google.com/p/nhocr/.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published