Project 5. Car price prediction

Done by Alena Kurylchyk and Marina Bovkush as BK_Team

LB MAPE 11.84955%, ranking is 22.

Project objective

We need to build a model that can predict car's price by its features.

Project features

We don't have a ready dataset, so we have to parse auto.ru website to collect information for machine learning.
The metrics that will evaluate the quality of the model is MAPE (mean absolute percentage error).'

Requirements

Project should be done by a team.
Parsing code should be presented (in kaggle notebook or uploaded to github).
Project code must be presented on github and kaggle.
MAPE metric for the final model must exceed baseline's result.
Predicted price values must be submitted to kaggle, the result from the leaderboard must be presented on github.

What has been done

Gathering of a team.
Data enrichment.

Parsing of relevant data from auto.ru.
Unification and merging of test and parsed datasets.

EDA

Quick dataset overview using profile report.
Handling of duplicates.
Handling of missing values.
Visualisation of features distribution and relationship with a target value.
Outlier analysis.
Dividing features into categories.
Analysis of relation between features categories and with a target value.

Feature Engineering

Two new features have been included into dataset.

ML

Encoding of all binary and categorical features.
Testing of 5 different models: Random Forest, CatBoost, Gradient Boosting, XGBoost, LightGBM. Bagging and stacking have also been tested.

Results

Standartisation of numeric variabled hasn't given quality increase thus hasn't been used.
The best result was shown by Stacking of Gradient Boosting and XGBoost.
The best MAPE metric on the leaderboard is 11.84955%.
The leaderboard ranking is 22.

What could be improved

Perfoming of feature engineering.
Trying some NLP methods to extract useful data.
Better hyperparameters tuning.
Testing of other models (i.g. ExtraTrees).
More deep analysis of the data and the results to understand what impacts MAPE the most.

Retrospective

The project is too massive to be done within a week.
The team had a lot of issues with kaggle and parsing and has wasted a lot of time on solving them.
Time for Feature Engineering had to be reduced due to deadline.
The work has been very stressful under such conditions, the team is not satisfied.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
module_5		module_5
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

module_5

module_5

.gitattributes

.gitattributes

README.md

README.md

Repository files navigation

Project 5. Car price prediction

Done by Alena Kurylchyk and Marina Bovkush as BK_Team

LB MAPE 11.84955%, ranking is 22.

Project objective

Project features

Requirements

What has been done

Results

What could be improved

Retrospective

About

Releases

Packages

Contributors 2

Languages

gir2017/teamwork_kaggle

Folders and files

Latest commit

History

Repository files navigation

Project 5. Car price prediction

Done by Alena Kurylchyk and Marina Bovkush as BK_Team

LB MAPE 11.84955%, ranking is 22.

Project objective

Project features

Requirements

What has been done

Results

What could be improved

Retrospective

About

Topics

Resources

Stars

Watchers

Forks

Languages