Skip to content

fabriziomiano/Wagwan

Repository files navigation

Wagwan Fam?

This tool serves those who have, or work for someone who owns, a facebook page, and essentially want to know what the people are talking about in their post.

alt_text

What is it?

A simple front-end which can be used to run a basic, yet simple, keywords extraction on facebook posts. In addition, it employs spaCy default models to extract named-entities from comments. Visit spaCy page to know more about named entities. This tool is pretty much a word counter that employs standard NLP pre-processing, plus the NER part performed by spaCy.

How does it do it?

It brings up a webapp supported by a python back-end which is a taylored version of whats-the-topic. It requires an access token to get people's comments on a selected post. Additional info on how to get a token can be found at this link. In short, once a facebook developers account has been created, the access token can be generated through the Facebook GraphAPI.

The tool performs text preprocessing (tokenization, stopwords filtering, stemming) to make plots of the keyword-count plus a word cloud image - using the awesome word_Cloud library.

For devs

This tool has been developed on Ubuntu 18.04 and macOS High Sierra, but has never been seriously tested. It requires Python3+ and all the packages listed in requirements.txt.

Results

Here there are two images of the keyword-count bar plot, and the wordcloud, that are produced by running the tool on this post:

Word cloud with no stemming

alt text

Top 20 keywords

alt_text

The data

word count
clim 11
years 9
hoax 6
chang 5
stop 4
planet 4
10 4
biggest 4
ever 4
giv 3
meat 3
increasing 3
species 3
volcanoes 3
guess 3

Top 12 entities

This is a bar plot of the top N entities extracted from this post

alt_text

The data

entities count
Hawaii 3
Earth 2
Norm Haas 2
Earths 1
the First Law of Thermodynamics 1
Rene 1
Al Gore 1
Bill Nye 1
only one 1
97 % 1
at least 30 1
the First Law 1
100% 1
Maula Loa 1
days 1
Time 1

Acknowledgements

Thanks to the people at spaCy for the NE part,to the people who produced facebook-sdk for the ease of access to the data, and finally to the guys who made word_cloud for the awesome word-cloud images that can be produced.