Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should be available via BitTorrent and as a web database that can be queried #3

Open
scripting opened this issue Jul 31, 2018 · 8 comments

Comments

@scripting
Copy link

First, thanks for making the data available. I was asking about this recently. I would like to get a look at troll tweets, it might help us avoid arguing with them in the future.

However --

I wasn't able to download the file, abd this is not a great way to distribution the info. Better would be:

  1. BitTorrent distribution. It was made for data like this. GitHub, not so much.

  2. And it would be wonderful to have this online as a database that can be queried with SQL commands.

I would be happy to help either or both projects, assuming they don't already exist.

Thanks again for uploading the data.

Dave

@elithrar
Copy link

elithrar commented Jul 31, 2018

I just tweeted about your second point [tweet], since I've imported the dataset into BigQuery, which has a free tier (1TB of queries). The dataset is public.

You can query the dataset like so:

SELECT author, content, followers
FROM `optimum-rock-145719.fivethirtyeight_russian_troll_tweets.russian_troll_tweets`
WHERE language = "English"
ORDER BY followers DESC
LIMIT 5

@chohenry
Copy link

@elithrar Thank you so much for putting into BigQuery!

@scripting
Copy link
Author

@elithrar
Copy link

elithrar commented Aug 1, 2018 via email

@24AheadDotCom
Copy link

I put the tweets online here, with a search interface:

http://24ahead.com/influence-tweets

@fabioporta
Copy link

fabioporta commented Aug 4, 2018

tweets can be queried here too:
http://www.fromrussiawithtroll.com/

@elithrar
Copy link

elithrar commented Aug 19, 2018

Better late than never: I've posted a guide to querying my hosted dataset using BigQuery - https://blog.questionable.services/article/diving-into-fivethirtyeight-troll-tweets-bigquery/

e.g.

SELECT
  author,
  COUNT(*) AS count,
  FORMAT("%.2f", COUNT(*) / (
    SELECT
      COUNT(*)
    FROM
      `optimum-rock-145719.fivethirtyeight_russian_troll_tweets.russian_troll_tweets`) * 100) AS percent
FROM
  `optimum-rock-145719.fivethirtyeight_russian_troll_tweets.russian_troll_tweets`
GROUP BY
  author
ORDER BY
  percent DESC
LIMIT
  10

@chrisgherbert
Copy link

We've also put together a tool for querying the tweets online: https://russiatweets.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants