Saving and loading a model repeatedly causes it to break #3820

melon3r · 2018-03-21T08:36:54Z

Hi!

I'm feeding data to a model in small batches, saving the model to disk at the end of each batch, and loading it again for the next one. After a few batches, the model stops working and throws the following error when calling model.run(input):

Traceback (most recent call last):
  File "./anomalies.py", line 63, in <module>
    result = model.run(input)
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/frameworks/opf/htm_prediction_model.py", line 448, in run
    inferences = self._anomalyCompute()
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/frameworks/opf/htm_prediction_model.py", line 696, in _anomalyCompute
    self._getAnomalyClassifier().compute()
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/engine/__init__.py", line 433, in compute
    return self._region.compute()
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/bindings/engine_internal.py", line 1499, in compute
    return _engine_internal.Region_compute(self)
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/bindings/regions/PyRegion.py", line 184, in guardedCompute
    return self.compute(inputs, DictReadOnlyWrapper(outputs))
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/regions/knn_anomaly_classifier_region.py", line 326, in compute
    self._classifyState(record)
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/regions/knn_anomaly_classifier_region.py", line 405, in _classifyState
    self._addRecordToKNN(state)
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/regions/knn_anomaly_classifier_region.py", line 490, in _addRecordToKNN
    knn.learn(pattern, category, rowID=rowID)
  File "/home/dani/.local/lib/python2.7/site-packages/nupic/algorithms/knn_classifier.py", line 537, in learn
    inputPattern = numpy.dot(self._vt, inputPattern - self._mean)
ValueError: operands could not be broadcast together with shapes (65536,) (0,)

Here's the code used to load and store the model:

with open(model_file, 'r') as f:
    model = HTMPredictionModel.readFromFile(f)

with open(model_file, 'w') as f:
    model.writeToFile(f)

I've tried using a model generated from a previous batch and skipping some batches of data, to find out if it was the data that was somehow generating a bad model, but after the same number of batches, no matter their contents, I get to a broken model again. Thus, I suspect a bug is being triggered at readFromFile or writeToFile (or maybe I'm just doing it wrong).

This is with Python 2.7.9, and nupic 1.0.3 from pypi.

The text was updated successfully, but these errors were encountered:

rhyolight · 2018-03-21T22:18:38Z

Hey @lscheinkman and @scottpurdy, this might be another report similar to #3783.

@melon3r Can you perhaps attach some code we can run to replicate this?

ghost · 2018-03-22T00:13:49Z

@melon3r Can you try this...it's working fine for our project. We also found that you can compress the binary data here quite a bit...

from nupic.frameworks.opf.htm_prediction_model import HTMPredictionModel

    def serialize_htm(htm_model):
        proto = HTMPredictionModel.getSchema()
        builder = proto.new_message()
        htm_model.write(builder)
        return builder.to_bytes_packed() //returns binary data of htm_model

    def deserialize_htm(htm_buffer):
        proto = HTMPredictionModel.getSchema()
        reader = proto.from_bytes_packed(htm_buffer)
        return HTMPredictionModel.read(reader) //returns htm_model from binary data

Also, there is a #3805 minor bug in Nupic now where if you attempt to serialize and deserialize without processing any samples in between it will error out.

melon3r · 2018-03-22T12:12:17Z

Hey @kyle-sorensen, thank you for the tip, but it didn't work out for me. The model breaks at the exact same point.

@melon3r Can you perhaps attach some code we can run to replicate this?

@rhyolight I'll try to build a small script to reproduce it and share it ;)

rhyolight · 2018-03-22T16:48:07Z

Thanks @melon3r. Numenta engineer @lscheinkman is working on updating our regression test suite so that we serialize our models in the middle of running the NAB data set, then continue after de-serialization. We hope to see this test fail so we can fix the issue and update the source code. Your script might still be helpful, so please continue with it if you can.

melon3r · 2018-03-26T10:53:46Z

I found the "issue". 🤦‍♂️

Trying to replicate it I found it was always failing at the same record, the 2184th, with this config in the model parameters: 'autoDetectWaitRecords': 2184

I just copied if from the HotGym example, so I don't even understand it... Can you help?

rhyolight · 2018-03-26T18:47:05Z

@melon3r Can you try either removing it from the configuration or (if that doesn't work) making it extremely large? Then try again? If it works at least we know what to fix.

melon3r · 2018-03-28T09:27:27Z

Hi @rhyolight,

Removing it from the configuration gave it a default value of 4000. I could configure it to be very high, but I don't think that's how it's supposed to be run on production? Are models not supposed to run indefinitely?

What's this configuration actually doing? Debugging the error I found that after processing this number of records, flow changes and it starts doing something with a knn anomaly classification region, which it didn't before. What's the difference between the process before and after this threshold is reached?

rhyolight · 2018-03-28T15:33:45Z

It has to do with something unrelated to HTM. It is a legacy setting that is just causing trouble, and we should remove it. It is not affecting how the HTM runs, it's just expressing a bug. Set it to 999999999.

melon3r · 2018-04-04T06:58:24Z

Alright, thanks. 999999999 that makes for 1900 years of records, at one record per minute so I guess it'll be good :)

rhyolight · 2018-04-04T15:58:47Z

@lscheinkman found that this was still happening when he starting writing more tests for #3808.

rhyolight added type:bug priority:3 type:serialization labels Mar 21, 2018

rhyolight self-assigned this Mar 21, 2018

rhyolight closed this as completed Mar 26, 2018

rhyolight reopened this Mar 26, 2018

melon3r closed this as completed Apr 4, 2018

rhyolight reopened this Apr 4, 2018

lscheinkman mentioned this issue Apr 10, 2018

NUP-2506: Add test to all Serializable subclasses and fix related issues #3826

Merged

rhyolight closed this as completed in #3826 Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving and loading a model repeatedly causes it to break #3820

Saving and loading a model repeatedly causes it to break #3820

melon3r commented Mar 21, 2018 •

edited

rhyolight commented Mar 21, 2018

ghost commented Mar 22, 2018 •

edited by ghost

melon3r commented Mar 22, 2018

rhyolight commented Mar 22, 2018

melon3r commented Mar 26, 2018

rhyolight commented Mar 26, 2018

melon3r commented Mar 28, 2018

rhyolight commented Mar 28, 2018 •

edited

melon3r commented Apr 4, 2018

rhyolight commented Apr 4, 2018

Saving and loading a model repeatedly causes it to break #3820

Saving and loading a model repeatedly causes it to break #3820

Comments

melon3r commented Mar 21, 2018 • edited

rhyolight commented Mar 21, 2018

ghost commented Mar 22, 2018 • edited by ghost

melon3r commented Mar 22, 2018

rhyolight commented Mar 22, 2018

melon3r commented Mar 26, 2018

rhyolight commented Mar 26, 2018

melon3r commented Mar 28, 2018

rhyolight commented Mar 28, 2018 • edited

melon3r commented Apr 4, 2018

rhyolight commented Apr 4, 2018

melon3r commented Mar 21, 2018 •

edited

ghost commented Mar 22, 2018 •

edited by ghost

rhyolight commented Mar 28, 2018 •

edited