Skip to content

Yvonne27Jin/BSDS3002GP_Computational_propaganda_Ukraine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Readme

This is a course project of BSDS3002 Social computing from HKU. We aim to understand how both sides of Rassian-Ukrainian conflict used computational propaganda on twitter and how their influence and strategies differ. The complete report can be found here.

We used the "Ukraine Conflict Twitter Dataset" on Kaggle.

Code

In "code" file there are all the code for data processing and network analysis, and this includes:

  • 0_DCol.ipynb: Ramdom data sampling and regrouping
  • 1_DPP.ipynb: Data processing
    • Hashtag sorting
    • Political stance categorisation of tweets and users
    • Tweet text processing
  • 2_SEDA.ipynb: Simple Exploratory data analysis (EDA)
    • Discussion trend by political stance
    • Word clouds for both stances at different timeframe
  • 3_NCon_N.ipynb: Network Construction
    • Bipartite Network construction
    • Projected one-mode user network
    • Export edgelist and nodelist for Gephi visualisation
    • Node attributes: political orientation index and eigenvector centrality
  • 4_NAna.ipynb: Bot detection
    • Bot detection using botometer API
  • 5_bipartite_network_analysis.ipynb: Network analysis
    • Network properties: size, density, diameter, Average shortest path length, centrality, etc.
  • 6_Propaganda.ipynb: Trend and information flow of Russia propaganda
    • Discussion trend of the following keywords by time and political stance
      • "special military operation"
      • "Neo-Nazis" and "fascists"

Dataset

Code folder also include the most important datasets:

20220502_resampled_dataset.csv contains the raw data of tweets randomly sampled from the Kaggle dataset.

preprocessed_data.pkl: output of processed data, with all tweet-related and user-related information, added cleaned hashtags and political stance categorisation.

labelled.txt includs the final list of most frequent hashtags manually labelled for political stance.

Visualisation

Figures folder has all the plots of tweeter trends, including wordclouds, frequency histogram of hashtags, and density distribution of tweet/user creation time.

We used Gephi to visualise the networks. In the Gephi visualisation folder:

nodelist1.csv and nodelist2.csv: Node list exported for Gephi visualisation, containing user information of userid,username, user created date, No. Following and followers, total number of tweets, political orientation index, politcal categorization, and eigenvector centrality in the projected user network.

projected_w_user_edgelist_1.csv and projected_w_user_edgelist_2.csv: Weighted edgelist for the projected user network, where node representing users, edge representing the action of sharing the same tweet, and weight representing the number of overlapping tweets.

clusters_viz1_new.gephi and clusters_viz2_new.gephi are the updated Gephi graph file where we produced the network visualisation in the report.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published