The Dataset

Leia esse README em português.

A multilayer perceptron (MLP) consists of an Artificial Neural Network with at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.

In this repository there is a parallel implementation of an MLP that recognizes characters regardless of the font it is written in.

The Dataset

The original dataset consists of images from 153 character fonts obtained from UCI Machine Learning Repository. Some fonts were scanned from a variety of devices: hand scanners, desktop scanners or cameras. Other fonts were computer generated.

Usage

In order to use the code, you need to first and foremost clone this repository.

git clone github.com/viniciusvviterbo/Multilayer-Perceptron
cd ./Multilayer-Perceptron

Formatting the Dataset

In this project we opted for describing the main informations in the first line, an empty line - for ease of read, it is entirely optional -, and the data itself. Example:

[NUMBER OF CASES] [NUMBER OF INPUTS] [NUMBER OF OUTPUTS]

[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]

For testing the code, we included a reduced dataset (sampleNormalizedFonts.in), and it can be used for better understanding the needed format.

Normalizing the dataset

A normalized dataset is preferred for its (kind of) absolute results given at the end of training: 0 or 1. To normalize the dataset, execute:

g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < PATTERN_FILE > NORMALIZED_PATTERN_FILE

Example:

g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < ./datasets/patternFonts.in > ./datasets/normalizedPatternFonts.in

Compiling the source code

Compile the source code using OpenMP

g++ mlp.cpp -o mlp -O3 -fopenmp -std=c++14

Training and Result

In this code, we are dividing the dataset informed by half. The first half is used for training purposes only, the second one is used for testing, this way the network sees the latter half as new content and tries to obtain the correct result.

Executing

For executing, the command needs some parameters:

./mlp HIDDEN_LAYER_LENGTH TRAINING_RATE THRESHOLD NUMBER_OF_THREADS < PATTERN_FILE

HIDDEN_LAYER_LENGTH refers to the number of neurons in the network hidden layer;
TRAINING_RATE refers to the network's rate of training, a floating point number used during the correction phase of backpropagation;
THRESHOLD refers to the maximum error admitted by the network in order to obtain an acceptably correct result;
NUMBER_OF_THREADS refers to the number of threads that the network is allowed to use;
PATTERN_FILE refers to the normalized pattern file

Example:

./mlp 1024 0.1 1e-3 4 < ./datasets/normalizedPatternFonts.in

As a more handy way to execute, we included in this repository a shell script to facilitate testing and seeing results from multiple executions in order to obtain an average runtime.

./script.sh

The script compiles the code as a sequencial implementation and runs it 5 times, then compiles it again as a parallel implementation and runs it 5 more times. For that, we are using the (already normalized and formated) reduced dataset sampleNormalizedFonts.in.

References

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Análise do Desempenho de uma Implementação Paralela da Rede Neural Perceptron Multicamadas Utilizando Variável Compartilhada - by GÓES, Luís F. W. et al, PUC Minas

Introdução a Redes Neurais Multicamadas - by Prof. Fagner Christian Paes

O que é a Multilayer Perceptron - from ML4U

Fabrício Goés Youtube Channel - by Dr. Luis Goés

Eitas Tutoriais - by Espaço de Inovação Tecnológica Aplicada e Social - PUC Minas

Koliko - by Alex Frukta & Vladimir Tomin

GNU AGPL v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
datasets		datasets
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
README.pt.md		README.pt.md
mlp.cpp		mlp.cpp
normalizeDataset.cpp		normalizeDataset.cpp
readme.txt		readme.txt
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

.gitignore

.gitignore

CITATION.cff

CITATION.cff

LICENSE

LICENSE

README.md

README.md

README.pt.md

README.pt.md

mlp.cpp

mlp.cpp

normalizeDataset.cpp

normalizeDataset.cpp

readme.txt

readme.txt

script.sh

script.sh

Repository files navigation

The Dataset

Usage

Formatting the Dataset

Normalizing the dataset

Compiling the source code

Training and Result

Executing

References

About

Releases

Packages

Contributors 3

Languages

License

viniciusvviterbo/Multilayer-Perceptron

Folders and files

Latest commit

History

Repository files navigation

The Dataset

Usage

Formatting the Dataset

Normalizing the dataset

Compiling the source code

Training and Result

Executing

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages