The Dataset for Hate Speech Detection in Indonesian

Data Format

The dataset is a two columns data of: label - tweet.
The label is Non_HS or HS:

Non_HS for "non-hate-speech" tweet
HS for "hate-speech" tweet.

Dataset Size

It consists of 713 tweets in Indonesian

Number of Non_HS tweets: 453
Number of HS tweets: 260.
Since this dataset is unbalanced, you might have to do over-sampling/down-sampling in order to create a balanced dataset.

References

Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, and Yudo Ekanata, "Hate Speech Detection in Indonesian Language: A Dataset and Preliminary Study ", in Proceeding of 9th International Conference on Advanced Computer Science and Information Systems 2017(ICACSIS 2017).

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.gitattributes		.gitattributes
IDHSD_RIO_unbalanced_713_2017.txt		IDHSD_RIO_unbalanced_713_2017.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

IDHSD_RIO_unbalanced_713_2017.txt

IDHSD_RIO_unbalanced_713_2017.txt

README.md

README.md

Repository files navigation

The Dataset for Hate Speech Detection in Indonesian

Data Format

Dataset Size

References

Licence

Contact

About

Releases 1

Packages

ialfina/id-hatespeech-detection

Folders and files

Latest commit

History

Repository files navigation

The Dataset for Hate Speech Detection in Indonesian

Data Format

Dataset Size

References

Licence

Contact

About

Resources

Stars

Watchers

Forks