Published tweet dataset used in Tweet Length Matters: A Comparative Analysis on Topic Detection in Microblogs includes tweet id and corresponding topic number. Topic numbers encoded as follows:
Topic | Topic Number |
---|---|
BLM Movement | 0 |
Covid-19 | 1 |
K-Pop | 2 |
Bollywood | 3 |
Gaming | 4 |
U.S. Politics | 5 |
Out-of-Topic | 6 |
In total, there are 354,310 tweet instances.
If you make use of these tools, please cite following paper.
@inproceedings{DBLP:conf/ecir/SahinucT21,
author = {Furkan {\c{S}}ahinu{\c{c}} and Cagri Toraman},
title = {Tweet Length Matters: {A} Comparative Analysis on Topic Detection in Microblogs},
booktitle = {Advances in Information Retrieval - 43rd European Conference on {IR} Research, {ECIR} 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part {II}},
series = {Lecture Notes in Computer Science},
volume = {12657},
pages = {471--478},
publisher = {Springer},
year = {2021},
url = {https://doi.org/10.1007/978-3-030-72240-1\_50},
doi = {10.1007/978-3-030-72240-1\_50},
}