VotedPerceptron

run.py

Implementation of the voted perceptron as detailed by Yoav Freund and Robert E. Schapire in "Large Margin Classification Using the Perceptron Algorithm" (1999).

In summary...
The general concept is simiilar to a boosting algorithm. The algorithm is the same as a regular perceptron algorithm, but instead of overwriting the previous weight vector, the algorithm stores each update of the weight vector in an array, repeating this for T epochs (T is a hyper-parameter found through validation testing by splitting up the training set data). Each perceptron weight vector has its own weight attached to it, determined by its lifetime; Models that last longer without misclassification have heavier weights, and vice versa. Then, the weighted sum of each model's prediction on a test point is thresholded by sign() for a final prediction.

main() produces and evaluates the voted perceptron model's performance on training sets of various sizes. The training sets are generated by test_file_creator.py which is available for modification (main() may also need filepath modifications if so).

training_files.zip

Training files Xtrain.csv and Ytrain.cs. Rows in Xtrain.csv represent a point in vector form, columns represent feature dimensions. Each row in Ytrain.csv holds the ground truth label for each corresponding point in Xtrain.csv (either 0 or 1).

Note: While the labels in Ytrain.csv are 0 and 1, they are converted to -1 and 1 in training to follow the implementation by Freund and Schapire. The output goes back to 0 and 1. Future Ytrain.csv files can be use -1 and 1 without changing the code (though you may want to comment out the label conversion for marginal performance gains, especially with larger datasets). Test sets not included, but must follow the same format as the training files.

data

This is a folder holding a collection of the test data sets produced by test_file_creator.py when called in the main() function of run.py. It includes the test sets and the predictions.

test_file_creator.py

Creates training sets using 5%, 10%, 20%, 50%, and 100% of the first 90% of the original training set, and uses the last 10% of the training data as a test set for local evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.gitignore		.gitignore
README.md		README.md
run.py		run.py
test_file_creator.py		test_file_creator.py
training_files.zip		training_files.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

README.md

README.md

run.py

run.py

test_file_creator.py

test_file_creator.py

training_files.zip

training_files.zip

Repository files navigation

VotedPerceptron

run.py

training_files.zip

data

test_file_creator.py

About

Releases

Packages

Languages

cvogitgud/VotedPerceptron

Folders and files

Latest commit

History

Repository files navigation

VotedPerceptron

run.py

training_files.zip

data

test_file_creator.py

About

Topics

Resources

Stars

Watchers

Forks

Languages