Skip to content

Code and data for course project of Social Data Science - ETH Zurich

License

Notifications You must be signed in to change notification settings

ywyue/TwitterEnvironmentalNetwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TwitterEnvironmentalNetwork

This repository contains the R markdown files and datasets for course project of Social Data Science - ETH Zurich

Group members: Tianyuan Wang, Haonan Yang, Yuehan Yang, Yuanwen Yue

Supervisor: Prof. David Garcia

Introduction

The datasets were collected using rtweet and Twitter's offical REST API. The datasets contain the following ten parts. Each dataset is provided in both csv and Rds formats (Due to Github storage's limitation, we only provide Rds format for repliesAll). You can also use all.RData to load all the data.

Data details

1 Environmental celebrities's profiles

  • Source: [users.csv] / [ users.Rds]
  • Description: the twitter profiles of 240 selected environmental celebrities. We referred to three public lists of top environmental influencers selected by Climate Week NYC, Onalytica and Corporate Knights. The twitter list can be found here.
  • Column: user_id, name, screen_name, location, description, url, followers_count friends_count, listed_count, created_at, favourites_count, etc.

2 Environmental celebrities's fields label

  • Source: [celebrity.csv] / [ celebrity.Rds]
  • Description: Based on Twitter profile and Wikipedia, celebrities were labeled into scientists, environmentalists, businessmen, politicians, athletes, writers/journalists, actors/singers/hosts, organisations, social activists and NGO officers.
  • Column: user_id, name, screen_name, location, field.

3 Environmental celebrities subset with fields label

  • Source: [fields.csv] / [ fields.Rds]
  • Description: We conduct network analysis on 240 celebrities and screened out 173 celebrities who have at least one connection with other celebrities
  • Column:
    • id: user id of this celebrity
    • name: screen name of thie celebrity
    • deg: node degree of this celebrity
    • community: community id which this celebrity belongs to
    • kcore: kcore of this celebrity
    • field: field of this celebrity

4 Environmental keywords

  • Source: [keyword.csv] / [ keyword.Rds]
  • Description: We browsed all the 240 celebrities' recent tweets and selected 35 keywords.
  • Column:
    • keyword: keywords related to environmental protection
    • importance: relevance to environmental protection
    • notes: some notes
    • synonyms: synonyms of keywords

5 Environmental related tweets

  • Source: [environmental_tweets.csv] / [ environmental_tweets.Rds]
  • Description: We manually checked all the 240 celebrities' recent tweets regarding environmental issues and selected 35 keywords. Then, we ran grep function in R to filter the last 1,000 tweets of each celebrity and obtained 45,298 related tweets.
  • Column:
    • screen_name: screen name of user who posted this Tweet
    • tweet_url: url of this Tweet

6 Environmental related tweets with reply

  • Source: [tweets_with_reply.csv] / [ tweets_with_reply.Rds]
  • Description: Before crawling replies, filter tweets based on two conditions:
    1. The reply_count should not be 0, which means that there must be at least one reply to this tweet.
    2. The tweet id should be the same with the conversation id, which means this tweet should be the original Tweet that started the conversation.
  • Column:
    • author_id: the id of the author who posted this tweet
    • id: tweet id
    • conversation_id: the id of the conversation to which this tweet belongs
    • public_metrics: includes retweet_count, reply_count, like_count, quote_count
    • created_at: the time this tweet was created
    • text: the text content of this tweet

7 Replies in all conversations

  • Source: [repliesAll.Rds]
  • Description: The replies in all conversations. There are 534135 records in total.
  • Column:
    • conversation_id: conversation id of this reply.
    • user_id: id of user who posted the original Tweet that started the conversation.
    • screen_name: screen name of user who posted the original Tweet that started the conversation.
    • author_id: id of user who posted the reply
    • created_at: the time this reply was created
    • text: the content of this reply

8 VADER sentiments scores for all celebrities

  • Source: [vader_result.csv] / [ vader_result.Rds]
  • Description: VADER sentiments scores for all celebrities.
  • Column:
    • screen_names: screen name of the celebrity.
    • vader_mean: mean of vader scores of all the replies to the celebrity's selected tweets.
    • vader_sd: standard deviation of vader scores of all the replies to the celebrity's selected tweets.
    • comments_num: number of replies to the celebrity's selected tweets.

9 Sample replies for validation

10 Manual scores for sample replies

  • Source: [manual_scores.csv] / [ manual_scores.Rds]
  • Description: the sentiment score of the sample of 500 replies labelled by group memers.
  • Column:
    • score1: scoring of group member 1
    • score2: scoring of group member 2
    • score3: scoring of group member 3
    • score4: scoring of group member 4
    • mean: average of the 4 group members’ scores, used as the true value

Use terms and notice

The use of the data should comply with Twitter's developer agreement and policy. If you have any questions with the data, please contact us by yuayue@ethz.ch.

About

Code and data for course project of Social Data Science - ETH Zurich

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published