Skip to content

Data self organisation experiement 4, using genetic algorithms and cellular automa.

Notifications You must be signed in to change notification settings

phill-holland/data-self-organisation-iv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Self Organisation-iv (Experimental)

This application demonstrates celluar automata, genetic algorithms and paralllel programming using Sycl and Intel's OneApi toolkit.

This is a work in progress to investigate how a data storage solution may self organise into efficient data structures, for storage and retrieval of data, with the aim of identifying data patterns for arbitrary sentence structures (i.e. sequential data).

The underlying principal is to test if a fundamentally chaotic system can self-organise into a stable data storage and retrieval system.

This version differs from https://github.com/phill-holland/data-self-organisation-ii (it's predecessor) in several ways;

  • Sentence input, separate by words can be inserted parallel into the system, from several different spatial starting positions.

  • If two moving data cells collide, the decision output is now non-commutative, i.e.;

"daisy" colliding with "give"

Is different to;

"give" colliding with "daisy"

The results will be different given this circumstance, in previous versions this would not the case given a different input. this ensures the output shall not be the same given a slightly different input, but the same words used in the same sentence position.

  • It is hypothesized word inputs need to directly interact with one another, and collide, before determine final output.

Method

Data is given a 3D dimension cube in order to move around in, one grid cell at a time. You can insert sentences into the system, using varyingly different randomly generated patterns, and pre-load the 3D cube with static data before the main loop starts.

Data is configured to behave in different ways when they collide with one another, potentially changing it's path of direction.

Output is determined when one data cell collides with another static data cell.

For this test program, we are using the song Daisy Daisy as a small data input test sample.

daisy daisy give me your answer do .
I'm half crazy for the love of you .
it won't be a stylish marriage .
I can't afford a carriage .
but you'll look sweet upon the seat .
of a bicycle built for two .

These input words are tokenised into integers and stored within an internal dictionary.

Demonstrates

  • Genetic Algorithms (semi pareto dominate calcuation)

  • Cellular Automata

  • Sycl/OneApi parallel GPU code

  • Triple buffer for general population, to maximise concurrency and reduce data locking

  • Relative sentence scoring

Scoring

This system implements a unique method for comparing sentences, based on their relative position to each other, rather than just by their fixed index within a goal sentence.

For instance if we take the goal sentence of;

daisy daisy give me your answer do

and score it against the example output of;

your answer do bicycle carriage

The word chain of 'your answer do' is a partial correct answer, but however, it's positioning within the desired sentence is wrong. The system ranks more highly the fact that 'your' is in front of 'answer' and 'do' follows it.

This method ensures it encourages partial answers, but with differing word positioning errors, for each subsequent iteration and generation, it should gradually correct those positioning errors. This method seems to be better suited to sentence matching given this unique application of genetic algorithms.

Todo

  • Implement different collision methods, by "age" of data in system

  • Limit inputted word age in the system by some mechanism, to enable length of output to be limited in some way (words must die to stop infinite output!)

  • Extend non-commutative collision calculate to include impacts with stationary data objects stored in the cache class (parallel/program.cpp line 524)

  • Either implement the above method, support by the cross-product of two vectors, or add further choices to the collision class genetic algorithm generation system;

int length = max_words * max_words * max_collisions;
int index = (word_token_a * max_words) + word_token_b;
  • Retest long sentence chains as input as well as output

  • Cache, add stationary data cells, that do not output data when collided with (a pseudowall)

  • Collision chain is currently limited to a single "rebound" movement before returning to default movement behaviour, extend collision movements to be > 1

  • Fix collision detection of native CPU code (native code has somewhat been ignored!) will not fix

  • Experiment with different scoring methods, one that treats the system less like a black box, and uses internal information to score output data-self-organisation-iii, abandoned

Problems

  • It get's slower and longer to find solutions the more "sentences" and epochs you run and configure it to look for

  • It get's slower when you're looking for long output sentences (more than 5 words)

  • It only needs one input word, to find a longer output sentence (encouraging input words to interact with each other whilst in the system, for example by encouraging more word collisions makes the genetic algorithm less performant)

  • May need running several times with the same parameters to find the correct answer (a common drawback of genetic algorithms)

Potential Alternative Applications

It's conceivable that you can use this method for the encryption of messages, to generate a decryption key, the method is as followed;

  • Generate a random sentence, with those words as the input dictionary of the system

  • Define the message to be encrpyted as the desired output of the system

  • Run the system until the output matches the random sentence generated in step one

This has several advantages, the input and output messages have no relationship with one another beyond this system that generated the "key", no information about the encrypted method is stored within the output.

The current weakness of the system is that you could potentially put any input words into the system to get an output message, this can be mitigated by encouraging more "input word" interactions with one another within the system, this can be handled by increasing this setting value, shown below;

parameters.scores.max_collisions = 0;

Increasing this number should encourage a more chaotic unpredictable behaviour of the system, the higher the number, the better it's ability to encrypt unique messages.

There are still drawbacks to this usage process though, the key in it's current form is not reusable, for each message you encrypt, you need to generate a new key, which then causes big flaws in the system, such as needing to exchange a new key each time you wish to encrypt a message, rendering the idea mostly a toy rather than having any current practical benefit.

There's an outside chance given more work this drawback can be overcome, but it may currently be technically improbable (?)

Building & Running (see requirements below)

  • Project comes with VSCode pre-configured launch and build tasks, however to manually build;
source /opt/intel/oneapi/setvars.sh --include-intel-llvm --force
mkdir build
cd build
cmake -DCMAKE_CXX_FLAGS="-O0 -g" -DBUILD_FOR_CUDA=true ../
cmake --build .

Assuming Intel OneApi is pre-installed and you have a newish Nvidia GPU (it will probably compile and run on GTX 10xx series and above)

Running the tests can be accomplished by;

cd build
ctest

The build process should pop an executable file in the build directory to run the application, it is advised to run the system from within VSCode, to tweak the input parameters!

Requirements

  • Originally written on a Linux based operating system.

  • This project is written in C++ and Sycl and requires that the Intel OneApi SDK is installed.

  • This project will compile using Cmake and requires support for version 3.20 or above.

  • Nvidia's Cuda must also be installed, in order for you to target Nvidia GPUs.

Releases

No releases published

Packages

No packages published