New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding position of block is so slow #138
Comments
You can't, the page iterator is how you extract this information. There is Is it the getiterator call itself that's taking 20 seconds or the iteration
|
Calling the getitetator is about 20 sec for me. The loop is not slow (Up to one second). |
I think I know why it's taking that long and that's because getiterator is If this is the case then you might be able to speed it up by changing some
|
I set EngineMode to "Default". It causes speeding up the getitetator calling . |
I need more speed. Its run time is about 5 seconds. No other suggestion? |
Are you doing the processing concurrently? For instance multiple documents
|
No, this time is just for one getiterator calling, and there is no other On Wed, Nov 26, 2014 at 11:03 PM, Charles Weld notifications@github.com
Azam Zahra Rahimi |
OK, this might just be how long it takes to process the image. Might be worth trying just running it through the tesseract.exe and seeing how long that takes. |
Do you mean I install the tesseract binary and run with it? |
I install tesseract -3.02.02 and run it. "tesseract MyPage.png outPage.txt -l eng" |
Yes that's what I ment, idea was to get a baseline timing. I also forgot What's the use case here? I would have thought a couple of seconds would
|
Could you please tell me how can I use the gethocr function. On Sat, Nov 29, 2014 at 1:02 PM, Charles Weld notifications@github.com
Azam Zahra Rahimi |
Okay, in this case I'd just do the following:
I don't think you'll be able to speed it up more than that.
|
I've used Page.AnalyseLayout() instead of Page.GetIterator(). It seems this function is faster than the GetIterator. |
No problem
|
Hi
I need to find position of each text block in image. I've read issue #64 and #81 and according to those, my code is:
Tesseract.Pix pix = Tesseract.PixConverter.ToPix(testimg);
using (var page = engine.Process(pix))
{
using (var iter = page.GetIterator())
{
do
{
if (iter.TryGetBoundingBox(Tesseract.PageIteratorLevel.Block, out blockBounds))
{
imgblock.Draw(new Rectangle(blockBounds.X1, blockBounds.Y1, blockBounds.Width, blockBounds.Height), new Bgr(255, 0, 0), thickness);
}
} while (iter.Next(Tesseract.PageIteratorLevel.Para));
}
}
It takes about 20 seconds for each image. Their size are 1868*2416. Is there another way to detect block size instead of "iter = page.GetIterator()"? This command is very slow for me.
Regards
Zahra
The text was updated successfully, but these errors were encountered: