[Bachelor's 4th year of Computing Science Honours degree: dissertation over the course of a research internship at Edinburgh Napier University]
In Machine Learning, Neural network have demonstrated flexibility and robustness properties. It is known that neural nets can be used for solving a wide variety of problems, provided that the topology is appropriately chosen. There are two main schools of thought when it comes to training neural networks: the use of gradient based methods with the back propagation algorithm and the use of evolutionary algorithms. This research project researches the automation of the design of the most adequate architecture and weights for solving various supervised learning problem.
This CLI tool is composed of 3 benchmarks and some additional directories:
- The
BP_experiment
directory contains the BP benchmark (using the FANN library) - The
NEAT_experiment
directory contains the NEAT benchmark (using NEAT library) - The
evolutionary_nets
directory contains the evolutionary nets benchmark (PSO, DE & AIS) - The
formatting_scripts
directory contains C++ scripts to perform CSV to FANN & FANN to CSV data set conversion. (see section on Adding more data sets) - The
data
directory contains the data sets to be used for the experiment. It is also in this directory that results are written.
The benchmark can be ran as a whole using run_all_benchmarks.sh
. It is also possible to run each benchmark independently using the run_experiment.sh
script of each experiment's directory. See below for more information on the libraries and main acronyms used in this project.
Algorithms:
- Differential Evolution (referred to as: DE)
- Particle Swarm Optimization (referred to as: PSO)
- Artificial Immune System: Clonal Selection (referred to as: AIS)
- Neuro Evolution of Augmenting Topologies (referred to as: NEAT)
- Gradient Descent with Back Propagation (referred to as: BP)
This work also contains implementations of the following techniques:
- Vectorized Feedforward Neural Network of any topology (using Linear Algebra)
- Segmentation of data set into Training, Validation and Test data subsets
- F1 score, MSE, %accuracy
- Neural Network Ensemble (commented for later study)
- Basic statistics (mean, variance, etc.) => (an implementation of the k-fold cross-validation method can be found but isn't currentl used for the experiment)
The application leverages the following libraries:
- Armadillo C++ Linear Algebra library
- FANN C++ library (implements the Gradient Descent/Back Propagation algorithm)
- NEAT C++ (Neuro Evolution of Augmenting Topologies => adapted to solve classification problems)
Simply run:
$ sudo apt-get install libarmadillo-dev libfann-dev octave
.
Octave is optional but allows you to generate plots by running pre-written scripts such as $ ./plot_all_results.sh
, which generates plots such as the MSE, F1 score and %accuracy against the number of calls made to the error function and so on (with error bars).
$ # Download the repository:
$ git clone https://github.com/HichameMoriceau/Evolutionary-neural-networks.git
$ cd Evolutionary-neural-networks/
$ # give execution permissions
$ chmod +x run_all_benchmarks.sh
$ # Execute benchmark (all 5 algorithms on all data sets)
$ ./run_all_benchmarks.sh
Once you're all set, you might be interested in modifying the hard coded parameters (Number of replicates, population size etc.) in the run_all_benchmarks.sh
script.
rm -rf Evolutionary-neural-networks/
sudo apt-get remove libarmadillo-dev libfann-dev octave
Please make sure that the data set only contains numerical values (you might want to do some pre-processing using a tool such as OpenRefine). The target attribute must be the last column of the data set. You'll see post transformation, I typically call these +"-transformed.csv".
Feature scaling will then be automatically applied when the benchmark loads the data set. The benchmark support classification problem with any number of attributes or prediction classes (2 or more).
- Add your data set in the
data
directory - In
data
, create a directory named after your data set following my convention ('-' must be replaced by "_", directory name must end with "_results") - Make sure that
BP_experiment/data/
contains data sets in the FANN format and with a.data
extension. - Create a genome file required for NEAT to run (same convention except the filename must end with "startgenes"), look at the deprecated but insightful
NEATDOC.ps
documentation for how to write these. (It defines the initial topology to be evolved). - Add its path as CLI argument within the
run_all_benchmarks.sh
script (always using a .csv extension).
In the formatting_scripts
directory you'll find C++ scripts to help you convert your .CSV data set into a .DATA format that the FANN library used in the BP_experiment
will be able to use.
If you wish to make changes to a benchmark or simply to manually run any C++ code here, you'll be able to find the compilation and execution commands by running the following commands. Feel free to look at the run_experiment.sh
scripts to see how each experiment is ran.
cat main.cpp | grep "Compile"
cat main.cpp | grep "Run"
Run make
within the NEAT_experiment
directory. The code used here is the original NEAT C++ benchmark application and comes with a Makefile.
evolutionary_nets
is a QT Creator project. Either build it from the IDE or follow these instructions to build it from CLI.
For improved performances, each replicate of the experiment is ran concurrently as an OpenMP thread.
My bachelor's dissertation is accessible at /dissertation/memoir.pdf
if you want to find out more on the theory/background of neural networks, evolutionary algorithms and see the results of the initial experiment (+ paper currently being written).