Skip to content

DSSCHack2017/UCL-24hours-data-science-challenge

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

In this challenge, we firstly use log-return to normalize the price data. Then, by using relations-network graphs in Azure and obtain the potentially predictable groups, the groups which have strong statistical predictability could be obtained. Further, by utilizing HISC Lasso, the valuable features could be selected and the noise features could be recognized and eliminated. In the predicting part, we used a gradient boosting classifier to train a machine learning model to predict the increments or decrements in the next 20 days for our selected predictable financial instruments. With fine-tuning of the parameters, we successfully trained our model which has a 68.75% accuracy of predicting test datasets. Furthermore, we model a Binomial-Levy stable model to measure the distributions of expected loss of our predicting model for model evaluation and risk control.

To sum up, there are several novel attributes that our model has obtained. Firstly, our model could hit an accuracy of 68.75%, remarkeably high for such time-series problem. Secondly, our model is well-explainable and has strong mathematical background. The HSIC method utilized in our model is designed for reflecting statistical independency, thus the theory of it is well-developed. And by dropping away somfeatures, we could not only improve the accuracy, but also obtain which features are strongly correlated. This property of our model perfectly answers the demands of the quesiton. Finally, we also utilized the Azure experiment tools and utilized its advantages.

The code and output can be found at jupyter notebook, Raw_data_clean is used to transform data from long_type to wide_type

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.4%
  • Python 4.6%