GitHub - dhchenx/tm-eval: A toolkit to quickly evaluate model goodness over number of topics

Topic Modeling Evaluation

A toolkit to quickly evaluate model goodness over number of topics

Metrics

Coherence measure to be used.

Fastest method - 'u_mass', 'c_uci' also known as c_pmi.
For 'u_mass' corpus should be provided, if texts is provided, it will be converted to corpus using the dictionary.
For 'c_v', 'c_uci' and 'c_npmi' texts should be provided (corpus isn't needed)

Examples

Example 1: estimate metrics for one topic model with specific number of topics

from tm_eval import *
# load a dictionary with document key and its term list split by ','.
input_file = "datasets/covid19_symptoms.pickle"
output_folder = "outputs"
model_name = "symptom"
num_topics = 10
# run
results = evaluate_all_metrics_from_lda_model(input_file=input_file, 
                                              output_folder=output_folder,
                                              model_name=model_name, 
                                              num_topics=num_topics)
print(results)

Example 2: find model goodness change over number of topics

from tm_eval import *
if __name__=="__main__":
    # start configure
    # load a dictionary (key,value) with document id as key and its term list combined by ',' as value.
    input_file = "datasets/covid19_symptoms.pickle"
    output_folder = "outputs"
    model_name = "symptom"
    start=2
    end=5
    # end configure
    # run and explore

    list_results = explore_topic_model_metrics(input_file=input_file, 
                                               output_folder=output_folder,
                                               model_name=model_name,
                                               start=start,
                                               end=end)
    # summarize results
    show_topic_model_metric_change(list_results,save=True,
                                   save_path=f"{output_folder}/metrics.csv")

    # plot metric changes
    plot_tm_metric_change(csv_path=f"{output_folder}/metrics.csv",
                          save=True,save_folder=output_folder)

Output results

License

The tm-eval toolkit is provided by Donghua Chen with MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dists		dists
examples		examples
src/tm_eval		src/tm_eval
README.md		README.md
pypi.py		pypi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dists

dists

examples

examples

src/tm_eval

src/tm_eval

README.md

README.md

pypi.py

pypi.py

Repository files navigation

Topic Modeling Evaluation

Metrics

Examples

Output results

License

References

About

Releases

Packages

Languages

dhchenx/tm-eval

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling Evaluation

Metrics

Examples

Output results

License

References

About

Topics

Resources

Stars

Watchers

Forks

Languages