LinkedIn Job Scraper

Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes

Download the polished dataset and view insights at - https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

User Configurations

Required

logins.csv
- Populate with multiple LinkedIn logins
- Specify the purpose of the login (search or detail retreiever)
- I recommend 1-3 logins for search and the remaining for more expensive attribute retrieval

Optional

details_retriever.py
- MAX_UPDATES: - Number of job postings to look up before sleeping. Increase with more accounts/proxies (default = 25)
- SLEEP_TIME: - Seconds to sleep between every iteration (default = 60)

Running

This program consists of 2 main scripts, running in parallel.

python search_retriever.py - discovers new job postings and insert the most recent IDs and minimal attributes into the database

python details_retriever.py - populates tables with complete job attributes

It's important to note that while search_retriever.py typically runs smoothly, even through your personal IP and a singular account, details_retriever.py can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:

Utilize multiple proxies and accounts when running details_retriever.py.
Experiment with different time delays to find the optimal settings.
Run details_retriever.py during periods of lower online activity, such as late-night hours and weekends, to catch up with the progress of search_retriever.py. This will ensure that both processes remain synchronized and up to date.

Converting Database to CSV

python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>

Creates a CSV file for each database, along with minimal preprocessing

Database Structure

You can find the structure of the database here

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
json_paths		json_paths
media		media
scripts		scripts
.gitignore		.gitignore
DatabaseStructure.md		DatabaseStructure.md
README.md		README.md
details_retriever.py		details_retriever.py
logins.csv.template		logins.csv.template
requirements.txt		requirements.txt
search_retriever.py		search_retriever.py
to_csv.py		to_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json_paths

json_paths

media

media

scripts

scripts

.gitignore

.gitignore

DatabaseStructure.md

DatabaseStructure.md

README.md

README.md

details_retriever.py

details_retriever.py

logins.csv.template

logins.csv.template

requirements.txt

requirements.txt

search_retriever.py

search_retriever.py

to_csv.py

to_csv.py

Repository files navigation

LinkedIn Job Scraper

User Configurations

Required

Optional

Running

Converting Database to CSV

Database Structure

About

Releases

Packages

Contributors 2

Languages

ArshKA/LinkedIn-Job-Scraper

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Job Scraper

User Configurations

Required

Optional

Running

Converting Database to CSV

Database Structure

About

Topics

Resources

Stars

Watchers

Forks

Languages