New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C# Tesseract 3.02 How I access each character of word from image #64
Comments
Answer for Q1:Check out the console sample provided as it gives an example of how to iterate through the results, however something like the following should work:
Note that the general result hierarchy is as follows: Block -> Para -> TextLine -> Word -> Symbol I.e. the result set can contain many Blocks, which can in turn contain many Paragraphs and so on. Answer for Question 2:As per above the |
Use opencv to find and crop the region. There is a guy with demos written in Python that aren't too hard to translate to .net.
|
Hi, I'm newbie here.
First, I need to draw rectangle on each character of word from image.
in old version of tesseract I found that we can access each character by
foreach (tessnet2.Character c in word.CharList)
e.Graphics.DrawRectangle..........
But, now I'm working on C# winform with Tesseract 3.02
TesseractEngine a = new TesseractEngine(@"./tessdata", "eng", EngineMode.TesseractAndCube);
Tesseract.Page page1 = a.Process(image);
foreach ( ....... in page1)
{
// draw rectangle from (bounding box of each character)
}
Question 1: how i access each character of page1.
I try many method like PageIteratorLevel and get some part of page like first line, first word or first block , but i can't get first character of them.
Well, I notice that on result text of HOCRtext from page1 each element like word, line , block has Bounding box's value.
Question 2: how i get value of bounding box of each element. ( I found only 1 method "TryGetBoundingBox" that return only boolean.
thank you.
The text was updated successfully, but these errors were encountered: