Perceptron Classifier with Sparse and Dense Vectors

This repository contains an implementation of a Perceptron for classification of sparse and dense vectors.

What is a Perceptron?

A Perceptron is a binary classifier that is used in supervised learning. It is a type of a simple neural network that can be used to classify input data into one of two categories. The algorithm learns a linear function that separates the two classes. The learning process involves adjusting the weights of the input variables until the classification error is minimized.

Implementation Details

The Perceptron in this repository is implemented in Python using the NumPy. It takes in either a dense vector or a sparse vector as input, and uses stochastic gradient descent to learn the weights of the input variables.

The sparse vector are obtained by weighting co-occurrence of two words with Positive Pointwise Mutual Information (PPMI) score, which is a measure of the association between two events (in this application: two words). PMI itself measures the log-likelihod of the joint probabiliy of two words occurring together to the product of their individual probabilities according to the following formula:

PMI $(w_1, w_2) = \log_2(\frac{P(w_1,w_2)}{P(w_1) \cdot P(w_2)})$

In this implementation only positive values of the PMI score are considered and all negative PMI values are set to 0.

The dense vector representations are word embeddings obtained by training Word2Vec with Gensim.

The model performance is evaluated with 5-fold cross-validation. The accuracy per batch will be printed out with the following command:

    python classifier.py --input-b data/B.txt --input-t data/T.txt --input-text data/input_text.txt

How to use Perceptron

Clone the repository to your local machine
Install the necessary dependencies by running pip install -r requirements.txt
Import the Perceptron Classifier from classifier.py and create an instance of the classifier.
Train the classifier on provided training data (can be found in directory data) by calling the train() method and passing training data and labels (can also be found in directory data: B.txt & T.txt).
Predict the labels of your test data by calling the predict() method and passing in your test data.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
.gitignore		.gitignore
README.md		README.md
classifier.py		classifier.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

README.md

README.md

classifier.py

classifier.py

requirements.txt

requirements.txt

Repository files navigation

Perceptron Classifier with Sparse and Dense Vectors

What is a Perceptron?

Implementation Details

How to use Perceptron

About

Releases

Packages

Languages

nneva/classification-sparse-dense-vectors

Folders and files

Latest commit

History

Repository files navigation

Perceptron Classifier with Sparse and Dense Vectors

What is a Perceptron?

Implementation Details

How to use Perceptron

About

Topics

Resources

Stars

Watchers

Forks

Languages