Skip to content

mattfullerton/tika-tesseract-docker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Apache Tika Server w/ Tesseract in Docker

Sets up a container based on java:7

Includes

If you prefer the latest stable version of Tika-server (including OCR via Tesseract), you may want to consider logicalspark/docker-tikaserver

Usage

To use the image from the Docker registry, just do:

sudo docker run -d -p 9998:9998 mattfullerton/tika-tesseract-docker

N.B.: This automated build has a problem preventing the process from running. An alternative, manually built repository is at mattfullerton/tika-tesseract-docker-no-automation, or you may have success building yourself (below), or not. I am trying to understand how this can happen!

I.e., alternatively try:

sudo docker run -d -p 9998:9998 mattfullerton/tika-tesseract-docker-no-automation

To build and run the container, do the following:

sudo docker build -t tika github.com/mattfullerton/tika
sudo docker run -d -p 9998:9998 tika

Test with commands like:

curl -T testpdf.pdf http://localhost:9998/tika
curl -T multipage_tiff_example.tif http://localhost:9998/tika

The second command uses OCR.

Author

Credits

About

Docker container to provide Apache Tika RESTful API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%