Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Crawler/Scraper/Parser #6

Open
austinoboyle opened this issue Apr 26, 2018 · 2 comments
Open

Job Crawler/Scraper/Parser #6

austinoboyle opened this issue Apr 26, 2018 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@austinoboyle
Copy link
Owner

austinoboyle commented Apr 26, 2018

Scrape jobs by various filters:

  • Location
  • Company
  • Etc

First Use Case: Scrape all jobs in Kingston

Relevant URL
https://www.linkedin.com/jobs/search/?keywords=&location=Kingston%2C%20Ontario%2C%20Canada&sortBy=DD

Process:

  1. Scrape Basic Info for All Jobs
  2. Based on Basic Scrape (job_id), run parallel scrape to get detailed info on all jobs

Basic Fields

  • title
  • job_id (links are '/jobs/view/ID')
  • location
  • company_name
  • company_id (links are '/company/ID')
  • company_image_link

Detailed Info

  • job_description
  • seniority_level
  • industries
  • employment_type
  • job_functions
@austinoboyle austinoboyle added the enhancement New feature or request label Apr 26, 2018
@austinoboyle austinoboyle self-assigned this Apr 26, 2018
@simarpreetsingh-019
Copy link

@austinoboyle I am working on a similar issue for my project, mostly founded which class I should parse and extract info, but i got struck when i try to download the source code of page, i got an utput like:
`
r = requests.get('https://linkedin.com/jobs/')
html_content = r.content

print(html_content)

print()
soup = BeautifulSoup(html_content,'html.parser')
print(soup)
`

to which i got output:
`

<script type="text/javascript"> window.onload = function() { // Parse the tracking code from cookies. var trk = "bf"; var trkInfo = "bf"; var cookies = document.cookie.split("; "); for (var i = 0; i < cookies.length; ++i) { if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) { trk = cookies[i].substring(8); } else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) { trkInfo = cookies[i].substring(8); } }

if (window.location.protocol == "http:") {
// If "sl" cookie is set, redirect to https.
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);
return;
}
}
}

// Get the new domain. For international domains such as
// fr.linkedin.com, we convert it to www.linkedin.com
var domain = "www.linkedin.com";
if (domain != location.host) {
var subdomainIndex = location.host.indexOf(".linkedin");
if (subdomainIndex != -1) {
domain = "www" + location.host.substring(subdomainIndex);
}
}

window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +
"&originalReferer=" + document.referrer.substr(0, 200) +
"&sessionRedirect=" + encodeURIComponent(window.location.href);
}
</script>

`

If you or anyone else can help me with how to get exact source code?

would be helpful for this issue also.
I know its an older issue but i thought of why creating new one when similar issue is already here. if needed , i would make new one.

@anilabhadatta
Copy link

I did a pull request ,
Added Jobs and People in CompanyScraper
If possible please test it on a temporary linked in account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants