New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans for tesseract 5.x.y #3673
Comments
What about releasing a 5.0.1 after Christmas at the end of December? I think there are several fixes since 5.0.0 which would be good for a new release. |
Mind reader :-) |
Right before tagging 5.0.1, you can update this sentence from the README:
|
What should be added into v5? |
We already have a wish list for improved training, a lot of issues with layout detection, want improved logging, and much more. Maintaining two branches did not work good with 4.x, and I am afraid it would not work better with 5.x. |
Maybe keep 5.0 as is? It is a good release with a number of changes. |
Do you plan to release 5.0.1 next week? |
Yes, unless we discover that something very important is still missing. |
There is still no fix, and I have no |
clang-cl is not worth it currently. |
You can release 5.0.1 without the clang-cl fix. |
Release 5.0.1 is now online. |
The next release could be a new minor version 5.1.0 with new features, maybe end of January (unless there is an urgent need for a bug fix release 5.0.2). I want to have especially image information in ALTO and hOCR output (see PR #3710 which implements that for hOCR), maybe more from the project list. The new minor release would also disable OpenMP by default for autoconf builds, too. |
https://packages.ubuntu.com/search?keywords=tesseract-ocr Are you going to update Ubuntu 22.04 to 5.0.1 soon? The feature freeze date is February 24. |
i uploaded:
I hope @jbreiden will upload them to debian. |
Hi @AlexanderP,
From https://tracker.debian.org/pkg/tesseract :
So, why can't you directly push new versions of Tesseract to Debian? |
I'd like to create a new release Tesseract 5.1.0 soon. Originally I had planned it for end of January. Are there any contributions or important bug fixes which should be included still pending (then I'd wait), or can we release now? |
I suggest you go ahead with 5.1.0 now. I would like to see improvements related to training and evaluation implemented, but they could go in a future release. |
Release 5.1.0 is now available. |
@amitdo no rights to upload to debian |
There are now several fixes and improvements in git master, so I think it's time for a new release 5.1.1. @egorpugin, is it possible to fix the CI sw build which is currently failing? Are there any other contributions or important bug fixes which should be included still pending (then I'd wait), or can we release now? Ideally #3782 should also be included. |
Yes, I'll check. |
Unfortunately windows build does not work (for me): I tried Clang (14) and MS Visual Studio (2019). Here are logs: |
|
I fixed sw build in ci. |
Which Monday/Tuesday? :-) |
That's a good question. Thanks for the reminder. Release 5.3.2 is now available. Thank you for all contributions and your support. If someone misses names of contributors in the release notes: that information is auto-generated by GitHub. I have no idea why GitHub ignores some commits there. Maybe it only considers contributions with related pull requests? |
Any plans for a new release? I'd like to see d7c0711 being released for sirfz/tesserocr#330. |
Commit 063ad31 is even more important. So yes, there are good reasons for a new release, and I'll prepare it as soon as possible. |
ChangeLog for the planned new release 5.3.3:
Please comment if something should be changed or is missing. |
Can you look at issue #4002 and try to resolve it? If you don't think you'll be able to fix it in a short time, you can still release 5.3.3 without a fix for that issue. |
Thank you for that hint. That was another regression (since 5.0.0-rc2) which is fixed in pull request #4141. |
Is there anything else missing for 5.3.3? If not, I'd tag it after PR #4141 was reviewed and merged. |
Release 5.3.3 is now available. New binaries for Windows are available, too. As always thank you for all contributions (especially from several new contributors) and your support. |
Should we publish a 5.3.4 this weekend? List of important changes:
Is there anything missing? |
+1 for releasing 5.3.4. |
+1 |
Done, see https://github.com/tesseract-ocr/tesseract/releases/tag/5.3.4. Thank you to all who contributed to the release with commits, pull requests, issue reports and in any other way. |
I hope you can push Tesseract 5.3.4 to Debian unstable so it will find its way to Ubuntu 24.04. |
List of important changes since 5.3.4:
Did I miss something? As soon as the renaming of frk -> deu_latf (see tesseract-ocr/langdata_lstm#59) is finished, I can publish a new release 5.3.5. |
Maybe we can also fix some more of the 138 issues which are reported by Coverity scan. |
What about macOS Preview? |
Do not test it with macOS Preview. Test with chrome on macos. Different programs render pdf differently and we do not know if they are correct. |
First, this reminds me the days when large percent of websites told you: "This site is best viewed by Internet Explorer". I think the most used pdf viewer on macOS is Preview. Mac users prefer to use Apple's tools. Same for Chrome vs. Safari. Also, this patch make Evince selection behave worse than how it behave before. There is also a suggested alternative patch that might work better across different renderer. |
We have four cases:
Is chrome viewer ok? Different viewer behavior means that someone is correct and others are not. |
Let's cpntinue the discusion about the pdf renderer in issue #2879. |
OK. I decided to remove my objection to the recent changes in the pdf renderer. |
What about the useless OpenCL code? It's about time we removed it. |
@jbarlow83, are the latest changes in Tesseract's PDF renderer compatible with OCRmyPDF, or would they break it? |
@stweil The changes in the PDF renderer are compatible with OCRmyPDF and yield a slight improvement in text positioning on Evince. LGTM. I tested Tesseract commit 2b07505 which includes egorpugin's changes by examining visual results in Evince using both OCRmyPDF's wrapper around the Tesseract PDF renderer ( |
The next release will be 5.4.0. |
amitdo commented Mar 18, 2024 •
Done in #4220. |
Can you please release 5.4.0 in the next few days? |
That's my plan. |
I suggest to focus on 5.x for 2022 at least.
That means we should not break the API (and ABI?). Use C++17, not C++20/C++23.
The text was updated successfully, but these errors were encountered: