Summarise speeches contained in .pdf documents.
summary.io
Import / export utilities for pdf file conversionsummary.preprocess
Data cleaning utiliessumamry.model
Modelling classes
make_slides.ipynb
Notebook for generating presentation. Includes examples of :- Data cleaning
- Feature extraction
- Topic modelling (including hyperparameter optimisation)
TextRank
summarisation & keyword ranking
summariser.ipynb
Text summary usingbert-extractive-summarizer
See the data directory for example input data:
*.pdf
- raw pdf documents*.txt
- extracted textall_data.csv
- extracted text consolidated into csv format
See the output directory for example output:
BERT_summaryXX.txt
- summarised documents usingbert-extractive-summarizer
ranked_summaryXX.txt
- summarised documents usingTextRank
entitiesXX.csv
- table of extracted entities (organisations / people)token_frequency.csv
- table of token frequency for each document