Market Analysis.

Data Aggregation, Dimensionality Reduction, and Continuous-Time Markov Chains.

By Michael Glushchenko and Charles Delapa.

Purpose

The purpose of this project is to improve my data-science skills by aggregating financial data, and to improve my statistical skills by attempting to apply what I've learned in undergrad Statistics to a real-world (mostly chaotic and unsolvable) problem. Since this problem isn't exactly solvable, the ultimate goal of this project is to find models that approximate the movement of some stocks with a 'decently-high' degree of success.

Goals and Steps

We use the data-aggregation repository to create the financial data for all steps that follow.
Next, we will clean up the aggregated data, filling in all gaps and fixing the fact that there's no financial data on weekends and holidays, as well as during certain times of day.
Use the Uniform Manifold Approximation & Projection Algorithm (UMAP) to reduce the dimensionality of each pattern (from 1000 data points to a 2-dimensional point on a plane). We will try to do this for separate groups of stocks, as well as separate time-frames, one time-frame at a time. The goals for this step would be to determine acceptable parameters for the UMAP algorithm, to obtain clusters/piles of different patterns, to run k-means clustering on the resulting patterns, and to map the labelings of each of those clusters back to the original patterns.
Once patterns are successfully labeled and put into separate groups, we can try to apply the idea of markov chains to approximate the probability that a certain stock pattern will appear next given some sequence of patterns.
It would be nice to eventtually provide an interface where a user can enter an instrument symbol, and, as an output, receive a chart of that stock with the 3-5 most-likely paths for the next timeframe (within the next couple minutes). We are doing this on a minute-by-minute basis so that the amount of extraneous variables is kept to a minimum.
Finally, we will write a research paper documenting our findings and any new discoveries made/skills learned.

Files

While the files contain self-explanatory code, this section briefly summarizes the purpose of each file.

Data Aggregation

Repository that we use to create financial data we work on throughout this project. More details can be found in the readme on the page linked.

dim_reduction/utils.py

Provides functions used in the pre-processing and data-cleaning part of the the project (functions that clean up the data, smooth it out, etc.).

dim_reduction/pre_processing.py

Using functions from utils.py, the file cleans up the financial data aggreagted using data-aggregation, breaks up each stock's chart into patterns of a pre-specified length, and saves those patterns, preparing for the dimensionality reduction portion of the project.

dim_reduction/Umap.ipynb

Test out some parameters of UMAP on a small amount of data to get a feel for how it works, whether it works at all, and to outline the approach that will be used for the large amount of data we have at hand.

How to Run

Still in development.

Sources

Umap How To

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data-aggregation @ ec87de5		data-aggregation @ ec87de5
dim_reduction		dim_reduction
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-aggregation @ ec87de5

data-aggregation @ ec87de5

dim_reduction

dim_reduction

.gitignore

.gitignore

.gitmodules

.gitmodules

README.md

README.md

Repository files navigation

Market Analysis.

Data Aggregation, Dimensionality Reduction, and Continuous-Time Markov Chains.

By Michael Glushchenko and Charles Delapa.

Table of Contents

Purpose

Goals and Steps

Files

How to Run

Sources

About

Releases

Packages

Contributors 2

Languages

mglush/market-analysis

Folders and files

Latest commit

History

Repository files navigation

Market Analysis.

Data Aggregation, Dimensionality Reduction, and Continuous-Time Markov Chains.

By Michael Glushchenko and Charles Delapa.

Table of Contents

Purpose

Goals and Steps

Files

How to Run

Sources

About

Topics

Resources

Stars

Watchers

Forks

Languages