Readme

This is a course project of BSDS3002 Social computing from HKU. We aim to understand how both sides of Rassian-Ukrainian conflict used computational propaganda on twitter and how their influence and strategies differ. The complete report can be found here.

We used the "Ukraine Conflict Twitter Dataset" on Kaggle.

Code

In "code" file there are all the code for data processing and network analysis, and this includes:

0_DCol.ipynb: Ramdom data sampling and regrouping
1_DPP.ipynb: Data processing
- Hashtag sorting
- Political stance categorisation of tweets and users
- Tweet text processing
2_SEDA.ipynb: Simple Exploratory data analysis (EDA)
- Discussion trend by political stance
- Word clouds for both stances at different timeframe
3_NCon_N.ipynb: Network Construction
- Bipartite Network construction
- Projected one-mode user network
- Export edgelist and nodelist for Gephi visualisation
- Node attributes: political orientation index and eigenvector centrality
4_NAna.ipynb: Bot detection
- Bot detection using botometer API
5_bipartite_network_analysis.ipynb: Network analysis
- Network properties: size, density, diameter, Average shortest path length, centrality, etc.
6_Propaganda.ipynb: Trend and information flow of Russia propaganda
- Discussion trend of the following keywords by time and political stance
  - "special military operation"
  - "Neo-Nazis" and "fascists"

Dataset

Code folder also include the most important datasets:

20220502_resampled_dataset.csv contains the raw data of tweets randomly sampled from the Kaggle dataset.

preprocessed_data.pkl: output of processed data, with all tweet-related and user-related information, added cleaned hashtags and political stance categorisation.

labelled.txt includs the final list of most frequent hashtags manually labelled for political stance.

Visualisation

Figures folder has all the plots of tweeter trends, including wordclouds, frequency histogram of hashtags, and density distribution of tweet/user creation time.

We used Gephi to visualise the networks. In the Gephi visualisation folder:

nodelist1.csv and nodelist2.csv: Node list exported for Gephi visualisation, containing user information of userid,username, user created date, No. Following and followers, total number of tweets, political orientation index, politcal categorization, and eigenvector centrality in the projected user network.

projected_w_user_edgelist_1.csv and projected_w_user_edgelist_2.csv: Weighted edgelist for the projected user network, where node representing users, edge representing the action of sharing the same tweet, and weight representing the number of overlapping tweets.

clusters_viz1_new.gephi and clusters_viz2_new.gephi are the updated Gephi graph file where we produced the network visualisation in the report.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
Figures		Figures
Gephi visualisation		Gephi visualisation
code		code
Group Project Report.pdf		Group Project Report.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figures

Figures

Gephi visualisation

Gephi visualisation

code

code

Group Project Report.pdf

Group Project Report.pdf

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Readme

Code

Dataset

Visualisation

About

Releases

Packages

Contributors 2

Languages

License

Yvonne27Jin/BSDS3002GP_Computational_propaganda_Ukraine

Folders and files

Latest commit

History

Repository files navigation

Readme

Code

Dataset

Visualisation

About

Resources

License

Stars

Watchers

Forks

Languages