Skip to content

katsel/aufschreirevisited

Repository files navigation

#aufschrei revisited

Analysing #aufschrei with Python

Requirements

  • Python 3 with all necessary packages installed
    • matplotlib==1.4.3
    • nltk==3.0.2
    • numpy==1.9.2
    • pandas==0.16.0
    • pymongo==3.0.1
    • python-dateutil==2.4.2
    • pytz==2015.2
    • six==1.9.0
  • MongoDB (or enabled port-forwarding to a remote MongoDB)

Set-up

  • Either:
    • EITHER: import the tweets to your local MongoDB instance using tweets2db.py
    • OR: open a an ssh port-forwarding connection in a separate command line window
ssh -L 27017:localhost:27017 username@server
  • start the iPython notebook by running ipython3 notebook --matplotlib=inline

  • when done: save notebook and stop the notebook server

  • close the ssh connection by typing exit

Contents

  • Word statistics
    • tokenizing
    • word freqs
    • removing stopwords
    • extracting usernames and hashtags
    • text concordance, common contexts, collocations
    • text dispersion
  • Cooccurrences and ngrams
    • most common tweets (that are not technically RTs)
    • ngrams
  • User stats
    • user activity (most active, ranking users by activity)
    • creating user specific corpora

Note

The output of some of the commands can be very long and may contain many lines of text. The Cell --> All Output --> Scroll Long setting of the notebook will make reading more convenient.

Ideas for further research

  • Does applying machine learning algorithms aid the categorization of Tweets?
  • Tweet statistics (such as: Most often retweeted, most favourited) and examine whether this kind of 'meta-data' can aid categorization
  • Fancy graphs and timelines! (matplotlib or R/ggplot2)

Bibliography

Related Projects

  • Analysis of the first 24 hours (timeline, content, user networks)

About

🐍 Analysing #aufschrei with Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published