#aufschrei revisited

Analysing #aufschrei with Python

Requirements

Python 3 with all necessary packages installed
- matplotlib==1.4.3
- nltk==3.0.2
- numpy==1.9.2
- pandas==0.16.0
- pymongo==3.0.1
- python-dateutil==2.4.2
- pytz==2015.2
- six==1.9.0
MongoDB (or enabled port-forwarding to a remote MongoDB)

Set-up

Either:
- EITHER: import the tweets to your local MongoDB instance using tweets2db.py
- OR: open a an ssh port-forwarding connection in a separate command line window

ssh -L 27017:localhost:27017 username@server

start the iPython notebook by running ipython3 notebook --matplotlib=inline
when done: save notebook and stop the notebook server
close the ssh connection by typing exit

Word statistics
- tokenizing
- word freqs
- removing stopwords
- extracting usernames and hashtags
- text concordance, common contexts, collocations
- text dispersion
Cooccurrences and ngrams
- most common tweets (that are not technically RTs)
- ngrams
User stats
- user activity (most active, ranking users by activity)
- creating user specific corpora

Note

The output of some of the commands can be very long and may contain many lines of text. The Cell --> All Output --> Scroll Long setting of the notebook will make reading more convenient.

Ideas for further research

Does applying machine learning algorithms aid the categorization of Tweets?
Tweet statistics (such as: Most often retweeted, most favourited) and examine whether this kind of 'meta-data' can aid categorization
Fancy graphs and timelines! (matplotlib or R/ggplot2)

Bibliography

Steven Bird, Ewan Klein & Edward Loper: Natural Language Processing with Python. O'Reilly 2009
Wes McKinney: Python for Data Analysis. O'Reilly 2013
Matthew A. Russell: Mining the Social Web. O'Reilly 2014
Axel Maireder, Stephan Schlögl: 24 hours of an #outcry: The networked publics of a socio-political debate. EJC 2014

Related Projects

aufschreiStat - Aufschrei statistics in Java, which was never finished but served as an inspiration to this project
aufschreib - A JavaScript webapp to classify #aufschrei Tweets (manually and through automatic categorization) and create statistics/timelines
#aufschrei Timeline - Display of all #aufschrei Tweets
24 hours of an #outcry: The networked publics of a socio-political debate (EJC paper)

Analysis of the first 24 hours (timeline, content, user networks)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
1_wordstatistics.ipynb		1_wordstatistics.ipynb
2_collocs_and_ngrams.ipynb		2_collocs_and_ngrams.ipynb
3_userstats.ipynb		3_userstats.ipynb
README.md		README.md
requirements.txt		requirements.txt
tweets2db.py		tweets2db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1_wordstatistics.ipynb

1_wordstatistics.ipynb

2_collocs_and_ngrams.ipynb

2_collocs_and_ngrams.ipynb

3_userstats.ipynb

3_userstats.ipynb

README.md

README.md

requirements.txt

requirements.txt

tweets2db.py

tweets2db.py

Repository files navigation

#aufschrei revisited

Analysing #aufschrei with Python

Requirements

Set-up

Contents

Note

Ideas for further research

Bibliography

Related Projects

About

Releases

Packages

Contributors 2

Languages

katsel/aufschreirevisited

Folders and files

Latest commit

History

Repository files navigation

#aufschrei revisited

Analysing #aufschrei with Python

Requirements

Set-up

Contents

Note

Ideas for further research

Bibliography

Related Projects

About

Resources

Stars

Watchers

Forks

Languages