Skip to content

poethan/Swed_Covid_TM

Repository files navigation

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

NEW gitpage of this project: https://github.com/HECTA-UoM/Swed_Covid_TM HECTA(healthcare text analytics) group in UoM(Uni Manchester)

\begin{abstract}

Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, from the social-media and healthcare domain, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about \textit{Coronavirus}. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. \textit{Our data is openly available at \url{https://github.com/poethan/Swed_Covid_TM}. }

\textbf{Keywords:} Latent Dirichlet Allocation (LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding

More resources and meta-data on this project to be downloaded at \url{https://drive.google.com/drive/folders/1jRwx7cjF8hMjy9OMA8aryzM5npVFlPyz?usp=sharing}

Reference

@misc{https://doi.org/10.48550/arxiv.2301.03029, doi = {10.48550/ARXIV.2301.03029},

url = {https://arxiv.org/abs/2301.03029},

author = {Griciūtė, Bernadeta and Han, Lifeng and Han, Li and Nenadic, Goran},

keywords = {Computation and Language (cs.CL), Social and Information Networks (cs.SI), FOS: Computer and information sciences, FOS: Computer and information sciences},

title = {Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method},

publisher = {arXiv},

year = {2023},

copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International} }

Acknowledgement

Softwares: (BERT-topic) https://github.com/MaartenGr/BERTopic (Gensim LDA) https://radimrehurek.com/gensim/models/ldamodel.html Institutes: Uni Manchester and Uni Saarland and Uni Malta

About

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published