This project was made from the necessity of a system that integrates convertion, summarization, translation, and alignment of corpus' files. From them, using a clean and easy interface, Automatic Summarizer have all of these functionalities linked, step by step, delivering a final result with an aligned summarized translated txt file. Made in Python (Django framework) using the best of existing Python APIs (Converter, Summarizer, Translator, Aligner). Below, each Automatic Summarizer app and its purpose:
Converter (PDF Miner)
This app converts PDF files into TXT format
Summarizer (Gensim)
This app summarizes the TXT file and return some metrics data
Translator (TextBlob)
This app translates the language of the TXT summarized file (e.g.: PT-BR -> FR)
Aligner (Gale & Church)
This app aligns the TXT summarized translated file
This project has an YAML file to configure database connection settings and others. The current path is config/automatic_summarizer.yml
.
To install the project, first your environment must have the docker installed (version 3.3).
After docker installed, you need to set the database configures on environments variables from db service (MySQL) in docker-compose.yml
file.
Then, you can manage the Dockerfile
using the orchestrator file docker-compose.yml
.
docker-compose up -d
If you already have a complete environment with Django/MySQL installed and don't want to use Docker, you can just install the project dependencies and make some database changes.
./setup.py install
- PDF Miner https://github.com/euske/pdfminer
- Gensim https://github.com/RaRe-Technologies/gensim
- TextBlob https://github.com/sloria/TextBlob
- Gale & Church https://github.com/vchahun/galechurch
- Leandro Rezende Rodrigues <leandro.l2r@gmail.com>
- Prof. Francisco Cláudio S. de Menezes <new.claudiomenezes@gmail.com>