Skip to content

Topic Analysis, Constructiveness and Toxicity for online articles and comments

License

Notifications You must be signed in to change notification settings

sfu-discourse-lab/TACT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TACT

Topic Analysis, Constructiveness and Toxicity for online articles and comments

Project Overview

We set out to investigate trends in the toxicity, constructiveness and topics of online comments. Online comments by themselves are interesting but it also helps to look at them in context - we do this using information about the articles on which these comments appeared.

To find these trends, we use various machine-learning-based systems and approaches. For constructiveness, we use CHECK::REFERENCE, a constructiveness system developed at Simon Fraser University by Dr Maite Taboada and Dr Varada Kolhatkar. For toxicity, we use Google's Perspective API. For topic modelling, we settled on creating a Latent Dirichlet Allocation (LDA) model, the details of which are discussed in subsequent sections.

The results of this project will provide insights to online news publications that monitor the comment sections on their websites.

Key findings

  1. Non-constructive and non-toxic comments make up the majority of online comments
  2. Non-constructive comments are more common than constructive comments
  3. Constructive comments tend to almost always contain a small proportion of toxicity
  4. The most common topics discussed in articles in The Globe and Mail are related to politics (global, national and regional)
  5. The proportions of the topics discussed in comments correlate directly with those of articles
  6. People comment more about politics than other topics, but they bring in personal experience and anecdotes when they do so
  7. There is a higher degree of constructiveness in the comments relating to topics about which more articles are written
  8. Toxicity in comments is not higher in certain topics over others; it is likely a fixed feature of online language

Folder Structure

  • doc: Contains documentation on the methods and the results
  • src: Contains all the code for this project - preprocessing, getting constructiveness and toxicity predictions, topic modelling, and visualizations
  • img: Contains images, some generated by code in this project, others generated using external websites

Contact

Vagrant Gautam (me@dippedrusk.com)

Maite Taboada (mtaboada@sfu.ca)