WineOCR

OCR-based tool, used to feed OpenWines open-data, using wine bottle labels images:

These scripts use nodejs, dependencies are managed via npm

Output:

$ node wineocr.js etiquette_3.jpg

CHÂTEAU MERCIER

CÔTES DE BOURG

APPELLATION CÔTES DE BOURG CONTRÔLÉE

2000

15H ml
ÇERÊRÏﬁLL MIS EN BOUTEILLE AU CHATEAU

“l |
S.C.E.A. FAMILLE CHETY, PRODUCTEUR A SA|NT—TROJAN (GIRONDE) FRANCE  .

The OCR output is both displayed in the console and written in a newly created [IMAGE_PATH].ocr.txt file

Usage:

$ node wineocr.js [path] [layout_analysis_option] [language_code]

Example:

$ node wineocr.js ./examples/etiquette_3.jpg

which is the equivalent of the default options:

$ node wineocr.js examples/etiquette_3.jpg 3 fra

Argument details:

both directory and single file path work, as first argument
3 is a layout analysis option for tesseract OCR
fra is a language code. Available languages depend on your tesseract installation (see below).

Note that you can either process a whole directory:

$ node wineocr.js examples/

Installation

1/2 - Install tesseract

tesseract is an open-source project and a madatory dependency for WineOCR.

For Mac OS X:

brew install tesseract --all-languages

For other OS as Windows or GNU/Linux, and details about installing only certain languages packs, check out the tesseract-ocr Project website.

2/2 Install dependencies (node-tesseract, etc.)

npm install

Options at run:

layout_analysis_option argument (see psm argument in node-tesseract lib) tells Tesseract OCR binary to only run a subset of layout analysis and assume a certain form of image. The options are:

0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR.
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

3 seems to be a good option for most wine bottle labels.

Considering tesseract alternative: OCR APIs

License

MIT License - See license file.

Sources are available at OpenWines/WineOCR on Github.

Issues, support

Please check OpenWines/WineOCR/issues page on Github.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
wineocr.js		wineocr.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

package.json

package.json

wineocr.js

wineocr.js

Repository files navigation

WineOCR

Output:

Usage:

Example:

Installation

1/2 - Install tesseract

2/2 Install dependencies (node-tesseract, etc.)

Options at run:

Considering tesseract alternative: OCR APIs

License

Issues, support

About

Releases

Packages

Languages

License

OpenWines/OpenWinesOCR

Folders and files

Latest commit

History

Repository files navigation

WineOCR

Output:

Usage:

Example:

Installation

1/2 - Install tesseract

2/2 Install dependencies (node-tesseract, etc.)

Options at run:

Considering tesseract alternative: OCR APIs

License

Issues, support

About

Resources

License

Stars

Watchers

Forks

Languages