Skip to content

saidsef/tika-document-to-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Tika Implementation CI Tagging Release

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Prerequisite

Deployment

Kubernetes Deployment

Create namespace, via kubectl create ns web Assuming you've checked out this repo

kubectl kustomize deployment/ | kubectl apply -f -

Or, to deploy via argocd:

kubectl apply -f deployment/argocd/application.yml

NOTE: Remeber to update Ingress hostname

Take it for a test drive:

Via CLI:

You'll need to forward service via kubectl port-forward -n web svc/tika-ui 8080

curl -d @test/url.json http://localhost:8080/ -H 'Content-Type: application/json'

Or, via Web UI:

Using a browser visit:

http://loclahost:8080/