Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell completion #4058

Open
Freed-Wu opened this issue Apr 23, 2023 · 5 comments
Open

shell completion #4058

Freed-Wu opened this issue Apr 23, 2023 · 5 comments

Comments

@Freed-Wu
Copy link

Your Feature Request

Can tesseract support completions for common shells? TIA!

@stweil
Copy link
Contributor

stweil commented Apr 23, 2023

Sure, we'd just need the required completion file for tesseract and for the training executables which currently does not exist. Do you want to write such a file for bash, maybe also for other shells?

@Freed-Wu
Copy link
Author

Freed-Wu commented Apr 25, 2023

A temporary work for zsh:

#compdef tesseract

local psm=(
  '0\:Orientation\ and\ script\ detection\ \(OSD\)\ only'
  '1\:Automatic\ page\ segmentation\ with\ OSD'
  '2\:Automatic\ page\ segmentation,\ but\ no\ OSD,\ or\ OCR\ \(not\ implemented\)'
  '3\:Fully\ automatic\ page\ segmentation,\ but\ no\ OSD\ \(Default\)'
  '4\:Assume\ a\ single\ column\ of\ text\ of\ variable\ sizes'
  '5\:Assume\ a\ single\ uniform\ block\ of\ vertically\ aligned\ text'
  '6\:Assume\ a\ single\ uniform\ block\ of\ text'
  '7\:Treat\ the\ image\ as\ a\ single\ text\ line'
  '8\:Treat\ the\ image\ as\ a\ single\ word'
  '9\:Treat\ the\ image\ as\ a\ single\ word\ in\ a\ circle'
  '10\:Treat\ the\ image\ as\ a\ single\ character'
  '11\:Sparse\ text.\ Find\ as\ much\ text\ as\ possible\ in\ no\ particular\ order'
  '12\:Sparse\ text\ with\ OSD'
  '13\:Raw\ line.\ Treat\ the\ image\ as\ a\ single\ text\ line,\ bypassing\ hacks\ that\ are\ Tesseract-specific'
)
local oem=(
  '0\:Legacy\ engine\ only'
  '1\:Neural\ nets\ LSTM\ engine\ only'
  '2\:Legacy\ +\ LSTM\ engines'
  '3\:Default,\ based\ on\ what\ is\ available'
)
local options=(
  "(- : *)"{-h,--help}"[Show minimal help message]"
  "(- : *)"--help-extra"[Show extra help for advanced users]"
  "(- : *)"--help-psm"[Show page segmentation modes]"
  "(- : *)"--help-oem"[Show OCR Engine modes]"
  "(- : *)"{-v,--version}"[Show version information]"
  "(- : *)"--list-langs"[List available languages for tesseract engine]"
  "(- : *)"--print-fonts-table"[Print tesseract fonts table]"
  "(- : *)"--print-parameters"[Print tesseract parameters]"
  --tessdata-dir"[Specify the location of tessdata path]: :_dirs"
  --user-words"[Specify the location of user words file]: :_files"
  --user-patterns"[Specify the location of user patterns file]: :_files"
  --dpi"[Specify DPI for input image]:VALUE"
  --loglevel"[Specify logging level]:LEVEL:(ALL TRACE DEBUG INFO WARN ERROR FATAL OFF)"
  -l"[Specify language(s) used for OCR]:LANG:($(tesseract --list-langs | tail -n+2))"
  '*'-c"[Set value for config variables]:VAR=VALUE"
  --psm"[Specify page segmentation mode]:Page segmentation mode:(($psm))"
  --oem"[Specify OCR Engine mode]:OCR Engine mode:(($oem))"
)

_arguments -s -S $options ':imagename:_files' ':outputbase:_files' '*:config:_files'

Freed-Wu added a commit to Freed-Wu/tesseract that referenced this issue Apr 27, 2023
@stweil
Copy link
Contributor

stweil commented Apr 27, 2023

A complete solution should not only support zsh and tesseract, but also bash and the training tools. What would be a good place in the source tree for the required files? I don't think that the tessdata directory is the right place. How do other projects handle this?

@Freed-Wu
Copy link
Author

but also bash and the training tools

I have not yet been familiar with bash completion. For other tools (ambiguous_words, ...), I am not familiar their usage. This can be a beginning, I think.
Many projects use some tool to generate shell completions (bash/zsh/...) (usually include man page, --help) from a data structure. Like

And many projects write some code to generate shell completions by themselves,
like pandoc (haskell) and texdoc (lua).

And many projects write shell completions manually, like

the tessdata directory is the right place. How do other projects handle this

You can refer the above projects to see their place.

@stweil
Copy link
Contributor

stweil commented Apr 27, 2023

Thank you for the examples. What about a new directory structure contrib/completions/{bash,zsh} for the completion files?

At least bash completions can also evaluate the output of COMMAND --help. Example: complete -F _longopt tesseract enables a simple bash completion for tesseract. That works already pretty good not only for tesseract but also for the training tools. But I noticed that especially the help texts for the training tools are not perfect. For example the documentation for the --help option is often missing. Improving the help texts is therefore a necessary part to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants