Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do the algorithms used by default work on complex problems? #56

Open
ell1e opened this issue Mar 22, 2023 · 6 comments
Open

Do the algorithms used by default work on complex problems? #56

ell1e opened this issue Mar 22, 2023 · 6 comments

Comments

@ell1e
Copy link

ell1e commented Mar 22, 2023

Forgive me for this very beginner question, but I noticed when reading about neural networks there are a lot of different training approaches and a lot of different signal activation types used, with apparently ReLU used a lot. Not that I would know, I know C but not that much about neural networks.

Anyway, I naively tried to use genann on a word classification problems. Three sets of english words, 1000 words category A, 1000 words category B, 900ish words category C. I truncated all to 10 characters, and used 10 inputs that I mapped to the ascii value (so range 32.0 ish to 127.0 ish with the exact value being the respective letter) and shorter words having the remaining inputs set to 0.0. I used two outputs with the two categories returned as 1.0 1.0. 0.0 1.0, 0.5 1.0.

No matter how I did this though, I couldn't get at all even the words in the original training set to even have an approximately correct categorization returned. It just doesn't work at all. I tried 300 and more training repetitions, I tried 10 layers and 50 neurons (which I'm guessing could be too little to map it all but shouldn't it be better than return nonsense on almost everything in the training set?), nothing. Is there some approximate rule maybe on how much neurons or layers I would even need to possibly map this, is that the culprit?

Or is this a more fundamental issue beyond just the parameter choice? Like, is a simple training loop no longer doing this justice? Is there some limitation of this library, like the sigmoid activation function genann defaults to may not be capable of this? Should I be using a different way of input mapping of text entirely? Sorry again for this being such a beginner question.

@ell1e
Copy link
Author

ell1e commented Mar 22, 2023

Okay, poking the code more, it seems like it expects the input values to be between -15 and 15 for optimal results (with lesser spread possibly collapsing due to the 4096 step size and larger spread being completely ruined by the hard cutoff). That should probably be in some prominent readme section, also that the outputs are expected to be between 0 and 1 which not all beginners might guess right. Nevertheless, adjusting the inputs like this still gives me nonsensical results, often just the same activation outputs no matter what word I hand in.

@codeplea
Copy link
Owner

that I mapped to the ascii value (so range 32.0 ish to 127.0 ish with the exact value being the respective letter)

it seems like it expects the input values to be between -15 and 15 for optimal results

Yeah, that's the first problem. You need to encode your input better. This isn't unique to genann and other neural networks. Most machine learning algorithms will expect inputs to be in a certain range. You could simply scale the ASCII input down, but even that would be far from optimal.

The reason it clips at +/-15 is because genann uses the Sigmoid function by default. You can read about it here: https://en.wikipedia.org/wiki/Sigmoid_function If you look at the graph there, you've already lost most of the functions slope by +/-4, let alone +/-15.

And to answer your title question, yes it will work on complex problems if you send enough compute time at it. That said, your problem sounds more like one of memorization, and while a neural network can do that, it seems like a database lookup would work better for your exact use-case. Are you expecting your model to generalize? How would it?

If you really want to go this route, do a web search for encoding word or letters for a neural network.

@ell1e
Copy link
Author

ell1e commented Mar 22, 2023

I didn't intend to suggest that the input range limit is bad, but that it would be nice if the README had some basic guidance on what ranges work. Right now the README doesn't even seem to give info on the library defaulting to a sigmoid, let alone good ranges for that. The examples sadly have no comments on this either. I see some value of a simple, get-started-in-5-minutes library lost if people are expected to check the source code first before they can get an idea what input ranges work.

As for generalization, I hope it might generalize on word endings. But I was just fooling around anyway, and a bit stumped why it wouldn't even work on training set input.

Edit: maybe in the "Hints" section would be useful? As for my test not working, I guess I just need way more neurons. Trying 255 * letters now, and letting it sit for a while...

@ell1e
Copy link
Author

ell1e commented Mar 22, 2023

Hm, I let it sit for a while with way more neurons trained a few hundred times but not even then will any training set value remotely give an even vaguely correct value. An example with a more complex problem with more complex input really would be nice to get an idea what sort of dimensions work for this.

@codeplea
Copy link
Owner

Maybe you should start with something simple and work your way up? There's any number of things that could be going wrong. E.g., you could take example4.c and make sure you understand every part of it, then change the dataset, test it, change the settings, and see how different number of hidden neurons affect the accuracy. Then make some toy dataset using words, if that's your end-goal, but maybe only 3 words to start with, and see if you can classify those three. Keep adding to it and working gradually, and you'll either get it working or know exactly where it went wrong.

You also need to think about how many parameters you're trying to learn with that many neurons, and the learning rate, and things like that. It's helpful to have it display progress as it goes.

get-started-in-5-minutes library

This really isn't intended to be that. If you want that, you should use Python and scikit learn. It'll work and be easy (although you'll probably still struggle with your word-input problem until you get a better encoding and insure your data actually has learnable patterns). Genann is if you want to really dive in and understand exactly what the code is doing, because the code is small and straightforward, but that doesn't necessarily make it easy to get going.

@ell1e
Copy link
Author

ell1e commented Mar 24, 2023

Easy code doesn't really conflict with a README having basic info on the input parameter range in my opinion (or an example on less trivial data, for that matter). Anyway, I am repeating myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants