Contains a neural network powered binary-classifier for the UC Irvine Mushroom Dataset.
Table of Contents generated with DocToc
- Dependencies
- Usage Instructions
- Data Pre Processing
- Neural Network Specifications
- Classifier Performance
- Python 2.7
- tensorflow
- tflearn
- hickle
Run source setup.sh
or ./setup.sh
In case you run into issues with permissions, run sudo chmod +x setup.sh
and try running the file again.
Run generate_datasets.py
This stage involves dividing the data set into three parts:
- Training
- Validation
- Testing
By default, they are split in the ratio 9:1:1 (Training: Validation : Testing).
This ratio can be modified by changing the TRAIN
, VALID
and TEST
constants in split_datasets.py
When ready, run split_datasets.py
run nn_model.py
For each feature, attributes are one-hot encoded. Missing values are represented as an independent bit in the one-hot encoded representation. These encoded attributes are then chained together to form a 126-bit long feature vector.
This binary classifier uses one hidden layer in addition to an input and output layer.
The particulars of each layer are described as under:
- Input Layer (126 nodes)
- Hidden Layer (64 nodes; activation function : relu)
- Output Layer (2 nodes; activation function : softmax)
On training the classifier on 90 % of the data (80% training + 10% validation), this model has achieved an accuracy of 100% on unseen Test data! Yay!