-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The text is not recognized from a png #182
Comments
Not sure if your already doing some preprocessing however this might help. Issue #115 describes some techniques which might be helpful. You can also enable the
|
In regards to converting the image to grayscale the actual formula is I'm also pretty sure your not correctly resizing the image to 300dpi in your crop function. If you check the output I think you'll find that it's actually the same size. What you'll need to do is us the source resolution and the work out a scaling factor from that. So assuming the source is 70 dpi (typical screen resolution) something like the following should work: public static Bitmap ResizeImage(Bitmap src, Single targetResolution)
{
if(targetResolution <= 0.0f) throw new ArgumentOutOfRangeException ("targetResolution", "The target resolution must be greater than zero.");
if(src.HorizontalResolution <= 0.0f) throw new ArgumentOutOfRangeException ("src", "The src image doesn't specify a horizontal resolution.");
if(src.VerticalResolution<= 0.0f) throw new ArgumentOutOfRangleException("src", "The src image doesn't specify a vertical resolution.");
Single horizontalScale = targetResolution / src.HorizontalResolution;
Single verticalScale = targetResolution / src.VerticalResolution;
Bitmap result = new Bitmap(src.Width * horizontalScale , src.Height * verticalScale);
b.SetResolution(targetResolution, targetResolution )
using (Graphics g = Graphics.FromImage((Image)b))
{
g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
g.DrawImage(src, 0, 0, result .Width , result.Height);
}
return b;
} Finally the code bellow will always fail as you've just set engine.SetVariable("tessedit_write_images", true);
if (engine.TryGetBoolVariable("tessedit_write_images", out result))
{
Assert.AreEqual(false, result, "The values are not equal");
} What you probably want is something like this: engine.SetVariable("tessedit_write_images", true);
if (engine.TryGetBoolVariable("tessedit_write_images", out result))
{
Assert.AreEqual(true, result, "The variable 'tessedit_write_images' should be enabled.");
} |
Note I've created an issue, #183, to support resizing\scaling Pix's as its probably a common operation. No promises that it'll be implemented anytime soon. |
Related to ResizeImage method, I thonk, instead of b.SetResolution(targetResolution, targetResolution ) |
Opps sorry should have been |
After I made your changes, this is the result, but is still not what I expected |
And this is the CropImage method after your changes
|
Umm sorry I'm out of ideas might just not be a high enough quality image.
Maybe try stackoverflow if you have not already?
|
Before you fix the issue, do you have any idea to step over this ? |
You know what it's funny, using the same tesseract library in Java, it works fine. I don't have to crop the image, just scale it. |
What happens if you use the tesseract command line tool?
|
I come with some updates. After looking to find the issue, I found what was the problem. Our method to Resize the image is not doing what we expect. Basically , the method doesn't resize the image, it draws with the same resolution. I added : After this, our image get a higher resolution: As I said in the previously comments, in Java , using AffineTransform, I get an image with better resolution: Trying to obtain the same as with VS , the text is not recognized completely, so I have to give the maxim targetresolution. In conclusion, not the Tesseract was the problem, our resize method was the problem, and I think is not fully optimized. |
Okay, I'll see if I can find some time this weekend to expose the resize
|
I've added a new Scale method to Pix which should work better for the use case. Can you get the latest source code and try it out? You can build a NuGet package by double clicking the build.bat file. |
Where should I found the build.bat file ? I have to uninstall the tesseract orc from NuGet Pacages and reinstall it again ? |
Anyway... the behavior of the library is very strange. Some times it recognizes all the characters and numbers, some time not. Has some difficulty to recognize numbers. for this, I have to play (increase/ decrease) with targetresolution to get the text from the image. I saw that the date is the most difficult to recognize from the image. |
No, just checkout the source, develop branch, and run ~\build.bat it will
generate a nuget package that you can then use by adding to a local nuget
repo.
|
Sorry, but I am not so familiar with this. Maybe you can give me more details... |
In Tesseract -master, I found a build.bat file.... this is the one ? |
Yes, however you'll need to change the brach to develop. Master only
|
I have this imagine, but the tesseract doesn't recognize the text from the imagine
The output after running the tesseract is:
Ammmz e um
Bzndmary Pbffiularamr
ugsmmm gmmm
Rzfiaume P3yMuiR:6aua
Stams Pay hefare 20‘ arnsrzz
The text was updated successfully, but these errors were encountered: