Track specific topic(s) in news and social media. Featuring: geolocation, translation to English, sentiment analysis, and topic modelling.
Built to support Philippine, Namibia and Ethiopian Red Cross Society.
Credits: Phuoc Phung, Wessel de Jong, Jacopo Margutti
This repo contains the code to:
- Download text data on a specific topic (e.g. COVID-19 vaccines)
- Translate it to English
- Analyze sentiment (is it positive or negative?)
- Divide it into topics
- Assign a topic and a representative example to each group
Topic modelling is built on top of GSDMM: short text clustering, while sentiment and translation use Hugging Face Models and/or Google Cloud Natural Language.
N.B. the creation of groups (a.k.a. clustering) is automated, but the topic description is not. You need a human to read some representative examples of each group and come up with a meaningful, human-readable description.
Data sources supported by the rumor-tracker:
- YouTube
- KoBo
- Azure Table Storage
Generic requirements:
- Azure Key Vault
- Azure Data Lake Storage
- OPTIONAL (Twitter): Twitter developer account
- OPTIONAL (geolocate): vector files of locations and country boundaries
- OPTIONAL (YouTube, translate): Google Cloud account
More in detail:
- Follow these instructions to store credentials in Azure Key Vault and use them with the rumor-tracker. Secrets need to be in json format and contain all necessary fields, templates TBI
- For 510: Google cloud service account credentials are accessible here, but create a new project if needed!. Login credentials in Bitwarden.
The rumor-tracker can be confgured via one configuration file (json), see country-specific examples under config
- Install Docker
- Build the docker image from the root directory
docker build -t rodekruis/rumor-tracker .
- Run the docker image in a new container and access it
docker run -it --entrypoint /bin/bash rodekruis/rumor-tracker
- Check that everything is working by running the pipeline (see Usage below)
- Congratulations! You can now use the rumor-tracker as a dockerized app in your favorite cloud provider, e.g. using Azure Logic App
TBI
Usage: run-pipeline [OPTIONS]
Options:
--config configuration file (json)
--help show this message and exit