Skip to content

Examining data similarities across stock/commodity/money markets. Reducing dimensionality via UMAP, applying stochastic processes to the results.

Notifications You must be signed in to change notification settings

mglush/market-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Market Analysis.

Data Aggregation, Dimensionality Reduction, and Continuous-Time Markov Chains.

By Michael Glushchenko and Charles Delapa.

Table of Contents

Purpose

The purpose of this project is to improve my data-science skills by aggregating financial data, and to improve my statistical skills by attempting to apply what I've learned in undergrad Statistics to a real-world (mostly chaotic and unsolvable) problem. Since this problem isn't exactly solvable, the ultimate goal of this project is to find models that approximate the movement of some stocks with a 'decently-high' degree of success.

Goals and Steps

  1. We use the data-aggregation repository to create the financial data for all steps that follow.
  2. Next, we will clean up the aggregated data, filling in all gaps and fixing the fact that there's no financial data on weekends and holidays, as well as during certain times of day.
  3. Use the Uniform Manifold Approximation & Projection Algorithm (UMAP) to reduce the dimensionality of each pattern (from 1000 data points to a 2-dimensional point on a plane). We will try to do this for separate groups of stocks, as well as separate time-frames, one time-frame at a time. The goals for this step would be to determine acceptable parameters for the UMAP algorithm, to obtain clusters/piles of different patterns, to run k-means clustering on the resulting patterns, and to map the labelings of each of those clusters back to the original patterns.
  4. Once patterns are successfully labeled and put into separate groups, we can try to apply the idea of markov chains to approximate the probability that a certain stock pattern will appear next given some sequence of patterns.
  5. It would be nice to eventtually provide an interface where a user can enter an instrument symbol, and, as an output, receive a chart of that stock with the 3-5 most-likely paths for the next timeframe (within the next couple minutes). We are doing this on a minute-by-minute basis so that the amount of extraneous variables is kept to a minimum.
  6. Finally, we will write a research paper documenting our findings and any new discoveries made/skills learned.

Files

While the files contain self-explanatory code, this section briefly summarizes the purpose of each file.

  1. Data Aggregation
  • Repository that we use to create financial data we work on throughout this project. More details can be found in the readme on the page linked.
  1. dim_reduction/utils.py
  • Provides functions used in the pre-processing and data-cleaning part of the the project (functions that clean up the data, smooth it out, etc.).
  1. dim_reduction/pre_processing.py
  • Using functions from utils.py, the file cleans up the financial data aggreagted using data-aggregation, breaks up each stock's chart into patterns of a pre-specified length, and saves those patterns, preparing for the dimensionality reduction portion of the project.
  1. dim_reduction/Umap.ipynb
  • Test out some parameters of UMAP on a small amount of data to get a feel for how it works, whether it works at all, and to outline the approach that will be used for the large amount of data we have at hand.

How to Run

Still in development.

Sources

2022-2023 © Michael Glushchenko, Charles Delapa

About

Examining data similarities across stock/commodity/money markets. Reducing dimensionality via UMAP, applying stochastic processes to the results.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published