Skip to content

Szafranerio/Data-Analysis-and-Machine-Learning-Modeling-on-Data-Science-Salaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis and Machine Learning Modeling on Data Science Salaries

Description:

This project encompasses a comprehensive analysis of data science salaries, exploring various factors that influence salary trends in the field. Using Python libraries such as Pandas, Matplotlib, Seaborn, Plotly, and Scikit-learn, we conduct exploratory data analysis (EDA) to gain insights into salary distributions, trends over the years, correlation between variables, and more.

Key Features:

Data Preprocessing: We begin by cleaning and preprocessing the dataset, handling missing values, and converting categorical variables into a suitable format for analysis.

Exploratory Data Analysis (EDA): Through visualizations like box plots, histograms, bar plots, and heatmaps, we explore the relationships between different variables such as work year, experience level, employment type, remote ratio, company size, and salary.

Statistical Analysis: We calculate descriptive statistics like mean, median, maximum, and minimum salary values for each year to understand salary trends over time.

Correlation Analysis: Utilizing correlation matrices and heatmaps, we investigate the relationships between variables to uncover any significant correlations.

Machine Learning Modeling: We employ linear regression and random forest regression models to predict salaries based on features like work year, experience level, employment type, job title, and remote ratio. Evaluation metrics such as mean squared error (MSE) and R-squared scores are used to assess model performance.

Visualization with Plotly: A parallel categories plot is generated using Plotly to visualize the relationships between multiple categorical variables simultaneously.

Conclusion:

This project offers valuable insights into the factors influencing data science salaries and provides a foundation for further research and analysis in the field. By leveraging EDA techniques and machine learning models, we aim to empower professionals and stakeholders with data-driven insights for informed decision-making in the domain of data science employment and compensation.