Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FTS OCR as an option in the menu script #2148

Open
p-bo opened this issue Nov 28, 2021 · 10 comments
Open

Add FTS OCR as an option in the menu script #2148

p-bo opened this issue Nov 28, 2021 · 10 comments

Comments

@p-bo
Copy link

p-bo commented Nov 28, 2021

Steps To Reproduce

  1. Use subitem in menu.sh script for installing Full text search platform

Expected Result

Among installed search components is also https://apps.nextcloud.com/apps/files_fulltextsearch_tesseract (including deb packages with tessearct engine and recognition definitions files for all languages on underlying system), which (Nextcloud app) was updated recently (after long time).

Actual Result

OCR component for Full text search is missing
(logical - it wasn't updated, thus available for long time, but situation changed recently - would it be possible to reflect that please?)

Screenshots, Videos, or Pastebins

No response

Additional Context

Many thanks for all your work on Nextcloud VM (lite, full)!

Build Version

22.2.3

Environment

By downloading the VM

Environment Details

No response

@p-bo p-bo added the bug label Nov 28, 2021
@enoch85 enoch85 added enhancement and removed bug labels Dec 3, 2021
@enoch85 enoch85 closed this as completed Jan 17, 2022
@enoch85 enoch85 reopened this Jan 17, 2022
@enoch85 enoch85 changed the title (Hansson IT Nextcloud VM Full) menu.sh - missing installation of OCR for Full text search in admin scripts Add FTS OCR as an option in the menu script Jan 17, 2022
@enoch85
Copy link
Member

enoch85 commented Jan 17, 2022

@p-bo You are welcome to work on this if you want it implemented.

Thanks!

@enoch85
Copy link
Member

enoch85 commented Jan 31, 2022

@Ark74 Something you think is a valid point?

@Ark74
Copy link
Collaborator

Ark74 commented Jan 31, 2022

AFAIK, OCR is not well supported yet, not sure if daita finished it or the current state.

@p-bo
Copy link
Author

p-bo commented Feb 1, 2022

Thanks you both for your commitment!
@enoch85 - unfortunately I'm not that god in scripting to implement this reliably
@Ark74 - what do you think - would it be wise to ask daita and ArtificialOwl regarding this OCR app status and future?

@p-bo
Copy link
Author

p-bo commented Feb 4, 2022

@daita and @ArtificialOwl - what is status of development of this FTS OCR module please? Thanks for your eventual answer(s) :-)

@ArtificialOwl
Copy link
Member

We had no bad feedback for a while.
The repos migrate into nextcloud/ and we might even support it ?

You need the tesseract binary on the server to have it working.

@ArtificialOwl
Copy link
Member

you might need also to edit some xml configuration of tesseract for its access rights

@p-bo
Copy link
Author

p-bo commented Feb 4, 2022

@ArtificialOwl

Regarding dependencies (tesseract engine and lang data) and adjusting their configuration, it is possible task for installation/maintenance scripts here (as done for other components for Nextcloud in Nextcloud VM). If I understood correctly, developers of these scripts need to be assured, that it is worth to integrate this (will be FTS OCR component compatibility maintained for future Nextcloud versions?).

From users point of view, it would be great to have possibility to search also in texts extracted from uploaded images - as is one accustomed in case of some others cloud storage offerings. There exists also Workflow OCR add-on for it, but approach there is a bit different (thus scope of using), than FTS OCR.

So, will be FTS OCR supported and thus is meaningful to politely ask Nextcloud VM maintaners to include this into their installation automation please?

Thanks for bearing with me :-)

@p-bo
Copy link
Author

p-bo commented Mar 6, 2022

@cronlabspl are we still talking about doing OCR processing of raster images / PDF files of user(s), already stored on Nextcloud, to be able to do full text search in them there (that mention of some device is confusing a bit there)?

@Piefje01
Copy link

In NC 24 the menu option to do OCR scan in missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@ArtificialOwl @enoch85 @Ark74 @p-bo @Piefje01 and others