StockMarket-Prediction

StockMarket prediction is a typical binary-classification problem, it is an open dataset from Kaggle, which has three raw datasets:

RedditNews.csv: The first column is the "date", and second column is the "news headlines". News headlines come from Reddit WorldNews Channel (/r/worldnews). They are ranked by reddit users' votes, and only the top 25 headlines are considered for a single date. In the csv file all news are ranked from top to bottom based on how hot they are. Hence, there are 25 lines for each date.
DJIA_table.csv: Dow Jones Industrial Average (DJIA), downloaded directly from Yahoo Finance: check out the web page for more info.
Combined_News_DJIA.csv: Combined two above datasets with 27 columns. The first column is "Date", the second is "Label", and the following ones are news headlines ranging from "Top1" to "Top25".

Time range is from 2008-08-08 to 2016-07-01, for task evaluation, we use data from 2008-08-08 to 2014-12-31 as Training Set, and Test Set is then the following two years data (from 2015-01-02 to 2016-07-01). This is roughly a 80%/20% split.

Since it's a binary-classification problem, it only has two labels:

"1" when DJIA Adj Close value rose or stayed as the same.
"0" when DJIA Adj Close value decreased.

For reading convenience, I divide the whole project to several parts:

PartI

Phase1: Loading in the Data
Phase2: EDA
Phase3: Sentiment Analysis
Phase4: Text Encoding techniques
Phase5: Stemming

PartII

Phase1: Empirical Probability
Phase2: Sigmoid Function
Phase3: KL Divergence
Phase4: Cross Entropy Loss
Phase5: Gradient Descent
Phase6: SGD and Mini-Batch Gradient Descent

PartIII

Phase1: Basic Classifier
Phase2: Confusion Matrix
Phase3: Classification Threshold & F-measure
Phase4: ROC curve & AUC

PartIV

Phase1: Naive Bayes
Phase2: Decision Tree
Phase3: Random Forest
Phase4: Gradient Boosting

Install

Since most of work is done on jupyter, you should install it with pip:

python3 -m pip install --upgrade pip
python3 -m pip install jupyter

Notice, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook itself, details can be found at HERE.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
images		images
.gitignore		.gitignore
Part_I.ipynb		Part_I.ipynb
Part_II.ipynb		Part_II.ipynb
Part_III.ipynb		Part_III.ipynb
Part_IV.ipynb		Part_IV.ipynb
README.md		README.md
vader_lexicon.txt		vader_lexicon.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

.gitignore

.gitignore

Part_I.ipynb

Part_I.ipynb

Part_II.ipynb

Part_II.ipynb

Part_III.ipynb

Part_III.ipynb

Part_IV.ipynb

Part_IV.ipynb

README.md

README.md

vader_lexicon.txt

vader_lexicon.txt

Repository files navigation

StockMarket-Prediction

PartI

PartII

PartIII

PartIV

Install

About

Releases

Packages

Languages

victorchennn/StockMarket-Prediction

Folders and files

Latest commit

History

Repository files navigation

StockMarket-Prediction

Install

About

Topics

Resources

Stars

Watchers

Forks

Languages