GitHub - Szafranerio/Data-Analysis-and-Machine-Learning-Modeling-on-Data-Science-Salaries: 📈Data Analysis and Machine Learning Modeling on Data Science Salaries

Exploratory Data Analysis and Machine Learning Modeling on Data Science Salaries

Description:

This project encompasses a comprehensive analysis of data science salaries, exploring various factors that influence salary trends in the field. Using Python libraries such as Pandas, Matplotlib, Seaborn, Plotly, and Scikit-learn, we conduct exploratory data analysis (EDA) to gain insights into salary distributions, trends over the years, correlation between variables, and more.

Key Features:

Data Preprocessing: We begin by cleaning and preprocessing the dataset, handling missing values, and converting categorical variables into a suitable format for analysis.

Exploratory Data Analysis (EDA): Through visualizations like box plots, histograms, bar plots, and heatmaps, we explore the relationships between different variables such as work year, experience level, employment type, remote ratio, company size, and salary.

Statistical Analysis: We calculate descriptive statistics like mean, median, maximum, and minimum salary values for each year to understand salary trends over time.

Correlation Analysis: Utilizing correlation matrices and heatmaps, we investigate the relationships between variables to uncover any significant correlations.

Machine Learning Modeling: We employ linear regression and random forest regression models to predict salaries based on features like work year, experience level, employment type, job title, and remote ratio. Evaluation metrics such as mean squared error (MSE) and R-squared scores are used to assess model performance.

Visualization with Plotly: A parallel categories plot is generated using Plotly to visualize the relationships between multiple categorical variables simultaneously.

Conclusion:

This project offers valuable insights into the factors influencing data science salaries and provides a foundation for further research and analysis in the field. By leveraging EDA techniques and machine learning models, we aim to empower professionals and stakeholders with data-driven insights for informed decision-making in the domain of data science employment and compensation.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
ds_salaries.csv		ds_salaries.csv
salaries.ipynb		salaries.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

README.md

README.md

ds_salaries.csv

ds_salaries.csv

salaries.ipynb

salaries.ipynb

Repository files navigation

About

Languages

Szafranerio/Data-Analysis-and-Machine-Learning-Modeling-on-Data-Science-Salaries

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages