Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more exps on functions and arguments #2

Open
jaeyk opened this issue Sep 16, 2020 · 1 comment
Open

more exps on functions and arguments #2

jaeyk opened this issue Sep 16, 2020 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@jaeyk
Copy link
Owner

jaeyk commented Sep 16, 2020

No description provided.

@jaeyk jaeyk changed the title questions about social media scraping (demo) questions about social media scraping (test) Sep 16, 2020
@mdsamarin
Copy link

mdsamarin commented Sep 23, 2020

Hi Jae, I have been playing with the code from last week and have a few clarifying questions about it:

  • . There is a line of code that says <Update 01_API.Rmd #3 and more than 3 arguments - pmap()>. Does this mean if the vector is greater than 3 to use pmap()? As in, if we target more than 3 elements do we use pmap() then? Just unsure under which conditions to use map() or pmap().

Yes, if you have one vector, then use map.
A list of two vectors -> map2
A list of more than two vectors -> pmap

# This will show you the documentation. 
?purrr::pmap
  • Furthermore, in that function, the start and end year use paste0() with <"as_ylo=" and "as_yhi"> as the objects in the function. These come from the particular url, correct? In another website example, what are we looking for in the url that is the 'identifiable' information?

These are called parameters and they change from one website to another. I will talk more about this when we delve into API.

  • Next, the html.nodes(), you input div.gs_ab_mdw as the division class, maybe you said this last week, but what other division classes are there and how do we know when to use them?

You can use Chrome developer mode to identify particular HTML/CSS elements you look for. I will talk more about this, when we get into web scraping.

  • Lastly, for writing a parsing function, you use the curl package - I dont quite understand this line of code and was wondering if you could go over it again and what it is exactly doing: <read_html(curl::curl(url, handle = curl::new_handle("useragent" = "Mozilla/5.0"))) %>% html_nodes("div.gs_ab_mdw") %>% html_text()

curl() comes from the curl package. that's what curl::curl() means.
new_handle() indicates error handling. I'm happy to talk about this in person later on, but we will definitely cover this subject in the coming weeks.

Thanks!

@jaeyk jaeyk changed the title questions about social media scraping (test) questions on the first week session Sep 23, 2020
@jaeyk jaeyk added documentation Improvements or additions to documentation and removed data type labels Sep 23, 2020
@jaeyk jaeyk changed the title questions on the first week session more exps on functions and arguments Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants