-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new feature: apply OCR to uploaded pdfs #64
Conversation
Fantastic tool! I’ll try it.
Many thanks.
… On Dec 7, 2023, at 6:53 AM, pricua ***@***.***> wrote:
Hi,
We have developed a functionality to apply OCR (image to text conversion) to PDFs that are uploaded to Dedalo.
When uploading a pdf, the option to apply OCR or not and a drop-down menu to select the OCR language is shown.
To do this, we use the ocrmypdf tool, which must be installed on the same server as Dedalo, since it is invoked from Dedalo via the shell_exec function.
We would like you to review these changes so that they can be incorporated into the official Dedalo code and that they can be used in any Dedalo installation, since we consider that it is a functionality that can be quite interesting for any area.
Above all, we are interested in the revision of the tool_upload file class.tool_upload.php. Both the way of obtaining the path of the file to be uploaded and the conversion between the Dedalo languages and the OCR languages, which we do manually, we think can be greatly improved. As well as anything else that you consider can be improved or optimized.
We can comment on what you need about it.
We hope this is of interest to the community and can be incorporated into the official Dedalo distribution.
Best
You can view, comment on, or merge this pull request online at:
#64
Commit Summary
3a88328 <3a88328> new feature: apply OCR to uploaded pdfs
File Changes (5 files <https://github.com/renderpci/dedalo/pull/64/files>)
M core/services/service_upload/js/render_edit_service_upload.js <https://github.com/renderpci/dedalo/pull/64/files#diff-1186898516378b074a473ae66b2abb3039ed00ce6c1507731ce2e11f36eb8eb5> (72)
M core/services/service_upload/js/service_upload.js <https://github.com/renderpci/dedalo/pull/64/files#diff-845c6621f548c389d9edf0295c5eb174188b594f9de2895d5398160d5271e34e> (951)
M tools/tool_upload/class.tool_upload.php <https://github.com/renderpci/dedalo/pull/64/files#diff-6a05c6f18f203a7afc7d4d635e261b47dd81f61347a1589f9d093ff9ab98b309> (35)
M tools/tool_upload/js/render_tool_upload.js <https://github.com/renderpci/dedalo/pull/64/files#diff-0b905fa028898f3fd0c1a3c27c13d37914ea4fdeb0080a8ce6f17d30820fcf98> (776)
M tools/tool_upload/js/tool_upload.js <https://github.com/renderpci/dedalo/pull/64/files#diff-bb207a54de409fa5399693ddd06cab1addfd7ba43e5a60a2aa8a376a7c9f8357> (186)
Patch Links:
https://github.com/renderpci/dedalo/pull/64.patch
https://github.com/renderpci/dedalo/pull/64.diff
—
Reply to this email directly, view it on GitHub <#64>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATRPMLXHPBTHC26XAU5EXKTYIGU5HAVCNFSM6AAAAABAK5W6NGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTANRQGE3TCNY>.
You are receiving this because you are subscribed to this thread.
|
Hi @pricua Thanks a lot about this new feature. It will be useful to all projects and I thinks that is possible integrate it into the main code. We are reviewing your code, and some comments arise.
From:
to:
The main fallback for labels are English, and if you need add some labels that are not into the ontology, please tell me, I will open it. Take account that, If the label is inside the tool, you will need to call with tool method as:
and please, don't add more html tags than necessary:
is not necessary, the ui.create_dom_element() will create the label node, so adding this tag the result will be:
Try to keep simple. Finally. We need time to test it, thanks again and I will back with more. |
Hi @pricua Well, full integration has been done! Just want to point a few things about the final integration:
Please review the actual code and compare it with your commit. The code was integrated into the pricua-v6_developer branch. Feel free to comment or suggest something else. We will merge into the master branch at the end of this week (Friday 7 June 2024) And thanks for improve Dédalo features... :) Best |
Hi,
We have developed a functionality to apply OCR (image to text conversion) to PDFs that are uploaded to Dedalo.
When uploading a pdf, the option to apply OCR or not and a drop-down menu to select the OCR language is shown.
To do this, we use the ocrmypdf tool, which must be installed on the same server as Dedalo, since it is invoked from Dedalo via the shell_exec function.
We would like you to review these changes so that they can be incorporated into the official Dedalo code and that they can be used in any Dedalo installation, since we consider that it is a functionality that can be quite interesting for any area.
Above all, we are interested in the revision of the tool_upload file class.tool_upload.php. Both the way of obtaining the path of the file to be uploaded and the conversion between the Dedalo languages and the OCR languages, which we do manually, we think can be greatly improved. As well as anything else that you consider can be improved or optimized.
We can comment on what you need about it.
We hope this is of interest to the community and can be incorporated into the official Dedalo distribution.
Best