Skip to content

Repository for my implementation of the Voted Perceptron model

Notifications You must be signed in to change notification settings

cvogitgud/VotedPerceptron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VotedPerceptron

run.py

Implementation of the voted perceptron as detailed by Yoav Freund and Robert E. Schapire in "Large Margin Classification Using the Perceptron Algorithm" (1999).

In summary...
The general concept is simiilar to a boosting algorithm. The algorithm is the same as a regular perceptron algorithm, but instead of overwriting the previous weight vector, the algorithm stores each update of the weight vector in an array, repeating this for T epochs (T is a hyper-parameter found through validation testing by splitting up the training set data). Each perceptron weight vector has its own weight attached to it, determined by its lifetime; Models that last longer without misclassification have heavier weights, and vice versa. Then, the weighted sum of each model's prediction on a test point is thresholded by sign() for a final prediction.

main() produces and evaluates the voted perceptron model's performance on training sets of various sizes. The training sets are generated by test_file_creator.py which is available for modification (main() may also need filepath modifications if so).

training_files.zip

Training files Xtrain.csv and Ytrain.cs. Rows in Xtrain.csv represent a point in vector form, columns represent feature dimensions. Each row in Ytrain.csv holds the ground truth label for each corresponding point in Xtrain.csv (either 0 or 1).

Note: While the labels in Ytrain.csv are 0 and 1, they are converted to -1 and 1 in training to follow the implementation by Freund and Schapire. The output goes back to 0 and 1. Future Ytrain.csv files can be use -1 and 1 without changing the code (though you may want to comment out the label conversion for marginal performance gains, especially with larger datasets). Test sets not included, but must follow the same format as the training files.

data

This is a folder holding a collection of the test data sets produced by test_file_creator.py when called in the main() function of run.py. It includes the test sets and the predictions.

test_file_creator.py

Creates training sets using 5%, 10%, 20%, 50%, and 100% of the first 90% of the original training set, and uses the last 10% of the training data as a test set for local evaluation.

About

Repository for my implementation of the Voted Perceptron model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages