Do the algorithms used by default work on complex problems? #56

ell1e · 2023-03-22T01:59:14Z

Forgive me for this very beginner question, but I noticed when reading about neural networks there are a lot of different training approaches and a lot of different signal activation types used, with apparently ReLU used a lot. Not that I would know, I know C but not that much about neural networks.

Anyway, I naively tried to use genann on a word classification problems. Three sets of english words, 1000 words category A, 1000 words category B, 900ish words category C. I truncated all to 10 characters, and used 10 inputs that I mapped to the ascii value (so range 32.0 ish to 127.0 ish with the exact value being the respective letter) and shorter words having the remaining inputs set to 0.0. I used two outputs with the two categories returned as 1.0 1.0. 0.0 1.0, 0.5 1.0.

No matter how I did this though, I couldn't get at all even the words in the original training set to even have an approximately correct categorization returned. It just doesn't work at all. I tried 300 and more training repetitions, I tried 10 layers and 50 neurons (which I'm guessing could be too little to map it all but shouldn't it be better than return nonsense on almost everything in the training set?), nothing. Is there some approximate rule maybe on how much neurons or layers I would even need to possibly map this, is that the culprit?

Or is this a more fundamental issue beyond just the parameter choice? Like, is a simple training loop no longer doing this justice? Is there some limitation of this library, like the sigmoid activation function genann defaults to may not be capable of this? Should I be using a different way of input mapping of text entirely? Sorry again for this being such a beginner question.

The text was updated successfully, but these errors were encountered:

ell1e · 2023-03-22T02:18:23Z

Okay, poking the code more, it seems like it expects the input values to be between -15 and 15 for optimal results (with lesser spread possibly collapsing due to the 4096 step size and larger spread being completely ruined by the hard cutoff). That should probably be in some prominent readme section, also that the outputs are expected to be between 0 and 1 which not all beginners might guess right. Nevertheless, adjusting the inputs like this still gives me nonsensical results, often just the same activation outputs no matter what word I hand in.

codeplea · 2023-03-22T02:29:55Z

that I mapped to the ascii value (so range 32.0 ish to 127.0 ish with the exact value being the respective letter)

it seems like it expects the input values to be between -15 and 15 for optimal results

Yeah, that's the first problem. You need to encode your input better. This isn't unique to genann and other neural networks. Most machine learning algorithms will expect inputs to be in a certain range. You could simply scale the ASCII input down, but even that would be far from optimal.

The reason it clips at +/-15 is because genann uses the Sigmoid function by default. You can read about it here: https://en.wikipedia.org/wiki/Sigmoid_function If you look at the graph there, you've already lost most of the functions slope by +/-4, let alone +/-15.

And to answer your title question, yes it will work on complex problems if you send enough compute time at it. That said, your problem sounds more like one of memorization, and while a neural network can do that, it seems like a database lookup would work better for your exact use-case. Are you expecting your model to generalize? How would it?

If you really want to go this route, do a web search for encoding word or letters for a neural network.

ell1e · 2023-03-22T02:36:14Z

I didn't intend to suggest that the input range limit is bad, but that it would be nice if the README had some basic guidance on what ranges work. Right now the README doesn't even seem to give info on the library defaulting to a sigmoid, let alone good ranges for that. The examples sadly have no comments on this either. I see some value of a simple, get-started-in-5-minutes library lost if people are expected to check the source code first before they can get an idea what input ranges work.

As for generalization, I hope it might generalize on word endings. But I was just fooling around anyway, and a bit stumped why it wouldn't even work on training set input.

Edit: maybe in the "Hints" section would be useful? As for my test not working, I guess I just need way more neurons. Trying 255 * letters now, and letting it sit for a while...

ell1e · 2023-03-22T10:35:58Z

Hm, I let it sit for a while with way more neurons trained a few hundred times but not even then will any training set value remotely give an even vaguely correct value. An example with a more complex problem with more complex input really would be nice to get an idea what sort of dimensions work for this.

codeplea · 2023-03-22T13:56:50Z

Maybe you should start with something simple and work your way up? There's any number of things that could be going wrong. E.g., you could take example4.c and make sure you understand every part of it, then change the dataset, test it, change the settings, and see how different number of hidden neurons affect the accuracy. Then make some toy dataset using words, if that's your end-goal, but maybe only 3 words to start with, and see if you can classify those three. Keep adding to it and working gradually, and you'll either get it working or know exactly where it went wrong.

You also need to think about how many parameters you're trying to learn with that many neurons, and the learning rate, and things like that. It's helpful to have it display progress as it goes.

get-started-in-5-minutes library

This really isn't intended to be that. If you want that, you should use Python and scikit learn. It'll work and be easy (although you'll probably still struggle with your word-input problem until you get a better encoding and insure your data actually has learnable patterns). Genann is if you want to really dive in and understand exactly what the code is doing, because the code is small and straightforward, but that doesn't necessarily make it easy to get going.

ell1e · 2023-03-24T07:13:44Z

Easy code doesn't really conflict with a README having basic info on the input parameter range in my opinion (or an example on less trivial data, for that matter). Anyway, I am repeating myself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do the algorithms used by default work on complex problems? #56

Do the algorithms used by default work on complex problems? #56

ell1e commented Mar 22, 2023 •

edited

ell1e commented Mar 22, 2023 •

edited

codeplea commented Mar 22, 2023

ell1e commented Mar 22, 2023 •

edited

ell1e commented Mar 22, 2023

codeplea commented Mar 22, 2023

ell1e commented Mar 24, 2023 •

edited

Do the algorithms used by default work on complex problems? #56

Do the algorithms used by default work on complex problems? #56

Comments

ell1e commented Mar 22, 2023 • edited

ell1e commented Mar 22, 2023 • edited

codeplea commented Mar 22, 2023

ell1e commented Mar 22, 2023 • edited

ell1e commented Mar 22, 2023

codeplea commented Mar 22, 2023

ell1e commented Mar 24, 2023 • edited

ell1e commented Mar 22, 2023 •

edited

ell1e commented Mar 22, 2023 •

edited

ell1e commented Mar 22, 2023 •

edited

ell1e commented Mar 24, 2023 •

edited