Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Output is NaN #77

Open
alexnix opened this issue Oct 25, 2016 · 18 comments
Open

Output is NaN #77

alexnix opened this issue Oct 25, 2016 · 18 comments

Comments

@alexnix
Copy link

alexnix commented Oct 25, 2016

I train my network on a set of data containing car data (year of fabrication, mileage, type, model as input and price as output). I try to predict price for another car but output is NaN. NaN is not even among the values in the training set so this seams like an issue with the brain module.

My code is on GitHib Gist, here: https://gist.github.com/alexnix/146fea914501d283c80635087dd87036

@nickpoorman
Copy link
Contributor

@alexnix - Your input type and model are non-numeric values. You must normalize your data first. I suggest one-hot encoding them: https://github.com/nickpoorman/one-hot

@alexnix
Copy link
Author

alexnix commented Oct 25, 2016

One-hot encoding for type and model?

I just learnt from YouTube Video that input values must be in [-1,1]. I am still wondering how to represent model and type, I could just label them with numbers like Audi 0.1, Opel 0.2 and so on but this will make the network find Audi and Opel similar because their labels are close to each other. Is there a god way to represent inputs with discrete, not correlated values? (such as car model, in my example)

Disclaimer: noob in AI/ML/NN here.

@nickpoorman
Copy link
Contributor

nickpoorman commented Oct 25, 2016

@alexnix - Yes one-hot encoding solves the issue of having a feature(s) with "categorical" values. In your case for example, type will be expanded from a vertical column in the matrix, of "Mazda", "Ford", "Volkswagen", "Renault", "Kia", "Hyundai", etc... to horizontal, with a boolean flag of 1 if it is that type.

For example: Your first input is: { year: 2009, mileage: 311000, type: "Mazda", model: "CX-7" }

So it will become something like:

[2009, 311000, 1, 0, 0, 0, 0, 0, ...]

Where the header for the columns might be:

[year, mileage, type_mazda, type_ford, type_volkswagen, type_renault, type_kia, type_hyundai, ...]

You might also want to normalize your values between 0 and 1 for this library. Get the min and max value for each column and then scale them between 0 and 1. https://github.com/nickpoorman/scale-number-range

@alexnix
Copy link
Author

alexnix commented Oct 25, 2016

Thank you for your advice, it was very useful indeed.

@robertleeplummerjr
Copy link

Is this issue resolved?

@Dok11
Copy link

Dok11 commented Dec 21, 2016

@nickpoorman, can I ask you?
You wrote:

Where the header for the columns might be:
[year, mileage, type_mazda, type_ford, type_volkswagen, type_renault, type_kia, type_hyundai, ...]

Is it mean what topicstarter must set and "model" by your example? Like this:
{model__maxda_cx_7: 1, model__bmw_x5: 0, model__nissan_xtrail: 0,...}

It's so many columns.. that's normal?

@nickpoorman
Copy link
Contributor

@Dok11, brain.js only allows for numeric values as inputs. One-Hot encoding allows you to transform Y-axis values into X-axis inputs with "ON or OFF" values.

This will naturally increase the dimensionality of the the inputs, so yes you will always end up with more columns. To reduce the number of columns, run your data set through dimensionality reduction via PCA, Lasso, or some other means.

This brain library is probably not what you want for highly dimensional data. Try using a library that can do matrix transforms quickly via BLAS or some other more efficient means.

@robertleeplummerjr
Copy link

robertleeplummerjr commented Dec 21, 2016

@nickpoorman & @Dok11, the new repository does do matrix transforms via the recurrent neural net...
Example from: BrainJS/brain.js@338cf70#diff-04c6e90faac2675aa89e2176d2eec7d8R25

//create a simple recurrent neural network
var net = new brain.recurrent.RNN();

net.train([{input: [0, 0], output: [0]},
           {input: [0, 1], output: [1]},
           {input: [1, 0], output: [1]},
           {input: [1, 1], output: [0]}]);
	
var output = net.run([0, 0]);  // [0]
output = net.run([0, 1]);  // [1]
output = net.run([1, 0]);  // [1]
output = net.run([1, 1]);  // [0]

@Dok11
Copy link

Dok11 commented Dec 21, 2016

@robertleeplummerjr, that's cool, but not for this task, right?
p.s. ye, I do very similar nn, and this ask very interest for me ;)

@Dok11
Copy link

Dok11 commented Dec 21, 2016

pps. Where I can see more examples?
https://github.com/harthur-org/brain.js/wiki is empty...

@Dok11
Copy link

Dok11 commented Dec 21, 2016

@nickpoorman: This will naturally increase the dimensionality of the the inputs, so yes you will always end up with more columns. To reduce the number of columns, run your data set through dimensionality reduction via PCA, Lasso, or some other means.

What about set dictinary as:
['cx-7', 'x5', 'x-trail', ...]
and use keys in input array like:
{... type: 0, ...}

Will this right work?

@cawa-93
Copy link

cawa-93 commented Mar 6, 2017

I have the same problem, but all of the input signals are already normalize:

const neural = require('../NeuralNetwork').toFunction() // In this directory are stored neural network and an array of learning
neural({
  albums: 0.011111111111111112,
  videos: 0.016523867809057527,
  audios: 0,
  notes: 0,
  photos: 0.00035337249878528203,
  friends: 0.009302790837251175,
  mutual_friends: 0,
  followers: 0.007113002799187086,
  subscriptions: 0,
  pages: 0.0063083522583901085,
  wall: 0.0005448000778285826
}) // { '0': NaN }

An example of learning sample:

{
  "input":{
    "albums":0,
    "videos":0.002345981232150143,
    "audios":0,
    "notes":0,
    "photos":0.019921374619020275,
    "friends":0.06461938581574472,
    "mutual_friends":0,
    "followers":0.004280263813796541,
    "subscriptions":0,
    "pages":0.0010093363613424174,
    "wall":0.22041054577293512
  },
  "output":[0]
}

When training, I use only one an element of the training sample - then the network will take you back a numerical result. But if you use at least 2 Elements - the result is not a number

Full train array in .json
All data were normalized using scale-number-range

@nickpoorman
Copy link
Contributor

@cawa-93 - I would have to take a look at the rest of your code - the setup of the network and how you are training the model. Another thing you should try is not using category mode. Simply supply your input vector as an array. Instead of this:

{
  "input":{
    "albums":0,
    "videos":0.002345981232150143,
    "audios":0,
    "notes":0,
    "photos":0.019921374619020275,
    "friends":0.06461938581574472,
    "mutual_friends":0,
    "followers":0.004280263813796541,
    "subscriptions":0,
    "pages":0.0010093363613424174,
    "wall":0.22041054577293512
  },
  "output":[0]
}

do this:

{
  "input":[
    0,
    0.002345981232150143,
    0,
    0,
    0.019921374619020275,
    0.06461938581574472,
    0,
    0.004280263813796541,
    0,
    0.0010093363613424174,
    0.22041054577293512
  ],
  "output":[0]
}

I've been using this in production for three years (training millions of models and making billions of predictions monthly), I assure you there is nothing wrong with the library.

@cawa-93
Copy link

cawa-93 commented Mar 6, 2017

@nickpoorman I create simple repository for you cawa-93/user-scaner

I noticed if the train network objects, the numerical data stored in net.json, however, if the I train arrays, all values = Null

@robertleeplummerjr
Copy link

Your letting negative values return from scaleNumberRange which I assume is your means of normalizing values.

/**
 * simple module to scale a number from one range to another
 */
var debug = require('debug')('scale-number-range');

module.exports = function scaleNumberRange(number, oldMin, oldMax, newMin, newMax) {
  if (process.env.SCALE_THROW_OOB_ERRORS) {
    if (number < oldMin) {
      debug('ERROR OOB - scale(%d, %d, %d, %d, %d)', number, oldMin, oldMax, newMin, newMax);
      throw new Error('number is less than oldMin');
    }
    if (number > oldMax) {
      debug('ERROR OOB - scale(%d, %d, %d, %d, %d)', number, oldMin, oldMax, newMin, newMax);
      throw new Error('number is greater than oldMax');
    }
  }
  const result = (((newMax - newMin) * (number - oldMin)) / (oldMax - oldMin)) + newMin;
  console.log(result);
  return result;
}

Outputs:

$ babel-node --presets es2015-node ./test
-1
-0.9953080375356997
-1
-1
-0.9601572507619595
-0.8707612283685106
-1
-0.9914394723724069
-1
-0.9979813272773151
-0.5591789084541298
{ '0': NaN }

@robertleeplummerjr
Copy link

If I *= -1 result, I still get NaN, so still investigating.

@nickpoorman
Copy link
Contributor

@cawa-93 - Two issues with your code. One you should filter out any user data that doesn't have the same shape. The following is going to cause issues.

{
  "id": 305576398,
  "counters": {
    "unknown": true
  }
}

To do this use a filter:

const learnArray = users
.filter(user => {
  for (let key in maxRages) {
    if (typeof user.counters[key] === 'undefined') {
      return false
    }
  }
  return true
})
.map(user => {
  let result = {
    input: {},
    output: []
  }

  for (let c in user.counters) {
    if (c !== 'messages' && c !== 'online_friends') {
      result.input[c] = scale(user.counters[c], 0, maxRages[c], 0, 1)
    }
  }

  result.output.push(user.counters.messages > 3 ? 1 : 0)

  return result
})

Also, you should scale to [0, 1] instead of [-1, -1].

Lastly, instead of using toFunction(), you should just use run to solve your NaN problem.

I've updated some of the code in this gist: https://gist.github.com/nickpoorman/cd9465edca726df8dc06dbdd2937d153

@robertleeplummerjr
Copy link

lol, beat me to it! In all fairness, I was getting a haircut.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants