Skip to content

viniciusvviterbo/Multilayer-Perceptron

Repository files navigation

logo titulo

divisoria

subtitulo eng

Leia esse README em português.

A multilayer perceptron (MLP) consists of an Artificial Neural Network with at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.

In this repository there is a parallel implementation of an MLP that recognizes characters regardless of the font it is written in.

The Dataset

The original dataset consists of images from 153 character fonts obtained from UCI Machine Learning Repository. Some fonts were scanned from a variety of devices: hand scanners, desktop scanners or cameras. Other fonts were computer generated.

Usage

In order to use the code, you need to first and foremost clone this repository.

git clone github.com/viniciusvviterbo/Multilayer-Perceptron
cd ./Multilayer-Perceptron

Formatting the Dataset

In this project we opted for describing the main informations in the first line, an empty line - for ease of read, it is entirely optional -, and the data itself. Example:

[NUMBER OF CASES] [NUMBER OF INPUTS] [NUMBER OF OUTPUTS]

[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]

For testing the code, we included a reduced dataset (sampleNormalizedFonts.in), and it can be used for better understanding the needed format.

Normalizing the dataset

A normalized dataset is preferred for its (kind of) absolute results given at the end of training: 0 or 1. To normalize the dataset, execute:

g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < PATTERN_FILE > NORMALIZED_PATTERN_FILE

Example:

g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < ./datasets/patternFonts.in > ./datasets/normalizedPatternFonts.in

Compiling the source code

Compile the source code using OpenMP

g++ mlp.cpp -o mlp -O3 -fopenmp -std=c++14

Training and Result

In this code, we are dividing the dataset informed by half. The first half is used for training purposes only, the second one is used for testing, this way the network sees the latter half as new content and tries to obtain the correct result.

Executing

For executing, the command needs some parameters:

./mlp HIDDEN_LAYER_LENGTH TRAINING_RATE THRESHOLD NUMBER_OF_THREADS < PATTERN_FILE
  • HIDDEN_LAYER_LENGTH refers to the number of neurons in the network hidden layer;
  • TRAINING_RATE refers to the network's rate of training, a floating point number used during the correction phase of backpropagation;
  • THRESHOLD refers to the maximum error admitted by the network in order to obtain an acceptably correct result;
  • NUMBER_OF_THREADS refers to the number of threads that the network is allowed to use;
  • PATTERN_FILE refers to the normalized pattern file

Example:

./mlp 1024 0.1 1e-3 4 < ./datasets/normalizedPatternFonts.in

As a more handy way to execute, we included in this repository a shell script to facilitate testing and seeing results from multiple executions in order to obtain an average runtime.

./script.sh

The script compiles the code as a sequencial implementation and runs it 5 times, then compiles it again as a parallel implementation and runs it 5 more times. For that, we are using the (already normalized and formated) reduced dataset sampleNormalizedFonts.in.

References

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Análise do Desempenho de uma Implementação Paralela da Rede Neural Perceptron Multicamadas Utilizando Variável Compartilhada - by GÓES, Luís F. W. et al, PUC Minas

Introdução a Redes Neurais Multicamadas - by Prof. Fagner Christian Paes

O que é a Multilayer Perceptron - from ML4U

Fabrício Goés Youtube Channel - by Dr. Luis Goés

Eitas Tutoriais - by Espaço de Inovação Tecnológica Aplicada e Social - PUC Minas

Koliko - by Alex Frukta & Vladimir Tomin

divisoria

GNU AGPL v3.0

About

A parallel implementation of an MLP used to recognize characters regardless of the font it is written in.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published