/
twitter_archive.qmd
111 lines (71 loc) · 2.92 KB
/
twitter_archive.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: "twitter_archive"
format: gfm
---
## Keeping up with the R community
### There are many sources
- newsletters
- mastodon?
- Meetups
## Twitter has been important
The #rstats hashtag opens a world. Useful to follow specific individuals and the people they follow.
Grabbing content can be a challenge. My process has been:
- Save Web pages that are referenced in Evernote using its handy web clipper plugin
- Save images to my photo library and take it from there
- Selective "likes"
- Extract relevant text with an OCR browser extension https://github.com/amebalabs/TRex
### Getting the Twitter Archive
Request a download here: https://twitter.com/settings/download_your_data
It can take a few days for it to be ready. You'll get a direct message on Twitter when it's ready to download.
### Use the built-in html interface
![](images/image-885175351.png)
### Using R to access the data from your Twitter Archive
Demonstration, looking at how to access and possible uses of `like.js` and `following.js`.
```{r}
#| label: libraries and such
library(tidyverse)
library(here)
library(jsonlite)
library(gt)
library(knitr)
```
Each archive file has some junk at the beginning that you need to skip. Different files have a different amount of junk, so you have to figure out how much of the file to skip.
```{r}
#| label: read the files
following <- read_file(here("data", "following.js")) |>
str_sub(start = 30) |>
fromJSON()
str(following[[1]])
like <- read_file(here("data", "like.js")) |>
str_sub(start = 25) |>
fromJSON()
str(like[[1]])
```
The `flatten` function is handy for pulling data frames out of JSON structures.
```{r}
following_df <- as_tibble(flatten(following$following))
glimpse(following_df)
like_df <- as_tibble(flatten(like$like))
glimpse(like_df)
```
`gt` is a handy way to have a look at text that would otherwise get truncated.
```{r}
sample_table <- like_df[1:50,2:3] |>
filter(str_detect(fullText,"http") & str_detect(fullText,"rstat") )
kable(sample_table, "simple")
```
```{r}
like_df[1:50, 2:3] |>
filter(str_detect(fullText, "http") &
str_detect(fullText, "rstat")) |>
mutate(fullText = paste(expandedUrl, "\n", str_wrap(fullText, 100))) |>
select(fullText) |>
unlist() |>
cat( sep = "\n\n")
```
## Other resources
- [Albert Rapp - How to collect dataviz from Twitter into your note-taking system](https://albert-rapp.de/posts/09_get_twitter_posts_into_your_notetaking_system/09_get_twitter_posts_into_your_notetaking_system.html)
- [Python twitter archive parser and other resources](https://github.com/timhutton/twitter-archive-parser)
- How to [download your twitter archive](https://support.twitter.com/articles/20170160)
- [A visual analysis around a twitter hashtag](https://blog.ouseful.info/2012/02/06/visualising-activity-round-a-twitter-hashtag-or-search-term-using-r/)
- [Find your Twitter pals on Mastodon](https://fedifinder.glitch.me/#)