Skip to content

rodekruis/social-media-listening

Repository files navigation

rumor-tracker

Track specific topic(s) in news and social media. Featuring: geolocation, translation to English, sentiment analysis, and topic modelling.

Built to support Philippine, Namibia and Ethiopian Red Cross Society.

Credits: Phuoc Phung, Wessel de Jong, Jacopo Margutti

Introduction

This repo contains the code to:

  1. Download text data on a specific topic (e.g. COVID-19 vaccines)
  2. Translate it to English
  3. Analyze sentiment (is it positive or negative?)
  4. Divide it into topics
  5. Assign a topic and a representative example to each group

Topic modelling is built on top of GSDMM: short text clustering, while sentiment and translation use Hugging Face Models and/or Google Cloud Natural Language.

N.B. the creation of groups (a.k.a. clustering) is automated, but the topic description is not. You need a human to read some representative examples of each group and come up with a meaningful, human-readable description.

Data sources supported by the rumor-tracker:

  1. Twitter
  2. YouTube
  3. KoBo
  4. Facebook
  5. Azure Table Storage

Setup

Generic requirements:

More in detail:

  • Follow these instructions to store credentials in Azure Key Vault and use them with the rumor-tracker. Secrets need to be in json format and contain all necessary fields, templates TBI
  • For 510: Google cloud service account credentials are accessible here, but create a new project if needed!. Login credentials in Bitwarden.

The rumor-tracker can be confgured via one configuration file (json), see country-specific examples under config

with Docker

  1. Install Docker
  2. Build the docker image from the root directory
docker build -t rodekruis/rumor-tracker .
  1. Run the docker image in a new container and access it
docker run -it --entrypoint /bin/bash rodekruis/rumor-tracker
  1. Check that everything is working by running the pipeline (see Usage below)
  2. Congratulations! You can now use the rumor-tracker as a dockerized app in your favorite cloud provider, e.g. using Azure Logic App

Manual Setup

TBI

Usage

Usage: run-pipeline [OPTIONS]

Options:
  --config                    configuration file (json)
  --help                      show this message and exit