The Analysis for Glassdoor Job Postings

From Scraping to Visualization

The Summary of this Project

In this project, I scraped over 3000 job postings for 'Data Analyst' from the Glassdoor website. Since the Glassdoor website is dynamic, I used the Selenium library in the Python to scrape the job postings related to 'Data Analyst'. After scraping the job postings, I cleaned the scraped data using the Python and converted the data to the format for the Relational Database to store it in the SQL format. Lastly, I visualized the data using Tableau, showing the salary distributions by state, city, sector, and skills.

The first thing to do for this project is to collect data. Instead of using the existing data from other sources, I collected the data by scraping. Since the Glassdoor website is dynamic, I use the Selenium library in Python to scrape the job postings. The code created for scraping is in glassdoor_scraping_selenium.ipynb. The key word for job searching was 'Data Analyst', and I collected about 90 or less job postings for each state. The elements scraped from the site were company name, job title, location, salary, rating, size, industry, sector, revenue, and job description.

Cleaning the Data

The scraped data from the previous step is in the glassdoor_data_analyst.csv file. The next step is to clean the data. Jupyter Notebook in the Python is used again for this cleaning process. The code for cleaning the data is in glassdoor_data_cleaning.ipynb.

In this step, not only cleaning the data but also converting the data to the relational database format are done. Converting the data in a spreadsheet to the relational database requires the skill for designing a database using the Entity Relationship Diagram (ERD). The ERD I created for this practice is Entity_Relational_Diagram.PNG. Based on the ERD I created, I created tables for each entity and stored the tables in the csv files. The csv files are in the table_csv_files folder.

Storing the Data in SQL

The relational database management system (RDMS) used here is MySQL. The SQL statements for creating tables are in creating_tables.sql. After creating a database and the tables, I inserted the data for each table by using the SQL statements in inserting_records.sql. Then, the original job posting data is stored following the relational database format.

In sql_queries.sql, I practiced some sql querries related to this data.

Data Visualization using Tableau

The final step of this project is to visualize the data. What I tried to do in this analysis is to show the distributions of average salaries by state, city, sectors, and skills using an interactive visualization tool. The visualization tool used in this practice is Tableau. The data used for this visualization is glassdoor_all.csv, and jobposting_for_tableau.sql shows how to retrieve this data from the database created in the previous step.

The created tableau dashboard for this project is Here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

table_csv_files

table_csv_files

.gitignore

.gitignore

Entity_Relational_Diagram.PNG

Entity_Relational_Diagram.PNG

README.md

README.md

creating_tables.sql

creating_tables.sql

glassdoor_all.csv

glassdoor_all.csv

glassdoor_data_analyst.csv

glassdoor_data_analyst.csv

glassdoor_data_cleaning.ipynb

glassdoor_data_cleaning.ipynb

glassdoor_scraping_selenium.ipynb

glassdoor_scraping_selenium.ipynb

inserting_records.sql

inserting_records.sql

jobposting_for_tableau.sql

jobposting_for_tableau.sql

sql_queries.sql

sql_queries.sql

Repository files navigation

The Analysis for Glassdoor Job Postings

The Summary of this Project

Contents

Scraping Job Postings from Glassdoor

Cleaning the Data

Storing the Data in SQL

Data Visualization using Tableau

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
table_csv_files		table_csv_files
.gitignore		.gitignore
Entity_Relational_Diagram.PNG		Entity_Relational_Diagram.PNG
README.md		README.md
creating_tables.sql		creating_tables.sql
glassdoor_all.csv		glassdoor_all.csv
glassdoor_data_analyst.csv		glassdoor_data_analyst.csv
glassdoor_data_cleaning.ipynb		glassdoor_data_cleaning.ipynb
glassdoor_scraping_selenium.ipynb		glassdoor_scraping_selenium.ipynb
inserting_records.sql		inserting_records.sql
jobposting_for_tableau.sql		jobposting_for_tableau.sql
sql_queries.sql		sql_queries.sql

yjeong5126/glassdoor_data_analyst

Folders and files

Latest commit

History

Repository files navigation

The Analysis for Glassdoor Job Postings

The Summary of this Project

Contents

Scraping Job Postings from Glassdoor

Cleaning the Data

Storing the Data in SQL

Data Visualization using Tableau

About

Topics

Resources

Stars

Watchers

Forks

Languages