Skip to content

In this repository, I have worked out in one of the Kaggle Competition Data, "Predict Future Sales". I have used XGBoostRegressor. I have also used Fastai API for Feature Preprocessing to enhance the Model accuracy. You can get insights about Fastai Implementation as well.

License

Notifications You must be signed in to change notification settings

ThinamXx/PredictFutureSales_XGBoost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Objective and Overview

  • In this Notebook, I will work with a challenging time-series dataset consisting of daily sales data, kindly provided by one of the largest Russian software firms - 1C Company.
  • I will predict total sales for every product and store in the next month using Datasets given.

Fastai Library or API

  • Fast.ai is the first deep learning library to provide a single consistent interface to all the most commonly used deep learning applications for vision, text, tabular data, time series, and collaborative filtering.
  • Fast.ai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches.

Preparing the Model

  • I have used Fastai API to train the Model. It seems quite challenging to understand the code if you have never encountered with Fast.ai API before. One important note for anyone who has never used Fastai API before is to go through Fastai Documentation. And if you are using Fastai in Jupyter Notebook then you can use doc(function_name) to get the documentation instantly.

Data

  • I had prepared the Data for this Project from Kaggle

XGBRegressor

  • XGBoost is an open-source software library which provides a gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting Library".

Processing with Fastai API

  • While working on Timeseries Data then, adding following function or API will enhance the accuracy of the Model a lot.
def add_datepart(df, fldnames, drop=True, time=False, errors="raise"):
    if isinstance(fldnames, str):
        fldnames = [fldnames]
    for fldname in fldnames:
        fld = df[fldname]
        fld_dtype = fld.dtype
        if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
            fld_dtype = np.datetime64
            
        if not np.issubdtype(fld_dtype, np.datetime64):
            df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True, errors=errors)
        targ_pre = re.sub("[Dd]ate$", '', fldname)
        attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
                'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
        if time: attr = attr + ['Hour', 'Minute', 'Second']
        for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower())
        df[targ_pre + 'Elasped'] = fld.astype(np.int64) // 10**9
        if drop: df.drop(fldname, axis=1, inplace=True)

Snapshot of using XGBRegressor

Image

About

In this repository, I have worked out in one of the Kaggle Competition Data, "Predict Future Sales". I have used XGBoostRegressor. I have also used Fastai API for Feature Preprocessing to enhance the Model accuracy. You can get insights about Fastai Implementation as well.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published