You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Jae, I have been playing with the code from last week and have a few clarifying questions about it:
. There is a line of code that says <Update 01_API.Rmd #3 and more than 3 arguments - pmap()>. Does this mean if the vector is greater than 3 to use pmap()? As in, if we target more than 3 elements do we use pmap() then? Just unsure under which conditions to use map() or pmap().
Yes, if you have one vector, then use map.
A list of two vectors -> map2
A list of more than two vectors -> pmap
# This will show you the documentation.
?purrr::pmap
Furthermore, in that function, the start and end year use paste0() with <"as_ylo=" and "as_yhi"> as the objects in the function. These come from the particular url, correct? In another website example, what are we looking for in the url that is the 'identifiable' information?
These are called parameters and they change from one website to another. I will talk more about this when we delve into API.
Next, the html.nodes(), you input div.gs_ab_mdw as the division class, maybe you said this last week, but what other division classes are there and how do we know when to use them?
You can use Chrome developer mode to identify particular HTML/CSS elements you look for. I will talk more about this, when we get into web scraping.
Lastly, for writing a parsing function, you use the curl package - I dont quite understand this line of code and was wondering if you could go over it again and what it is exactly doing: <read_html(curl::curl(url, handle = curl::new_handle("useragent" = "Mozilla/5.0"))) %>% html_nodes("div.gs_ab_mdw") %>% html_text()
curl() comes from the curl package. that's what curl::curl() means. new_handle() indicates error handling. I'm happy to talk about this in person later on, but we will definitely cover this subject in the coming weeks.
Thanks!
jaeyk
changed the title
questions about social media scraping (test)
questions on the first week session
Sep 23, 2020
No description provided.
The text was updated successfully, but these errors were encountered: