Skip to content

mattravenhall/OutbreakTopics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LDA topic modelling of outbreak reports

Quick Build

python3 __main__.py --corpus '../Documents/documents_1996-2019.txt'

nb. Input document assumes one document per line.

Optimisation

python3 __main__.py --corpus '../Documents/documents_1996-2019.txt' --run_types OPTIMISE

Exploration

Word Clouds

python3 __main__.py --corpus 'training_documents.txt' --explorations WORDCLOUDS

Word Bars

python3 __main__.py --corpus 'training_documents.txt' --explorations WORD_BARS

Clustering

python3 __main__.py --corpus 'training_documents.txt' --explorations PCA TSNE

Topics by Document

python3 __main__.py --corpus 'training_documents.txt' --explorations DOC_TOPICS

pyLDAvis

python3 __main__.py --corpus 'training_documents.txt' --explorations PYLDAVIS

Representative Documents

python3 __main__.py --corpus 'training_documents.txt' --explorations REP_DOCS

Topic Prediction

python3 __main__.py --corpus 'training_documents.txt' --predict 'document_for_prediction.txt'

nb. Input document assumes one document per line.

More info can be found in an accompanying blog post.