large dimension of the vector representation #161

stochasticer · 2014-02-01T17:11:15Z

Hi,
thanks for your help
I tried to save a trained model with dimension of the 'feature' vector = 2000. Although the model is trained well, i am unable to save the trained model... (am using a linux terminal on windows)
Here is the error:

In [9]: model.save('model_wiki_2000')

SystemError Traceback (most recent call last)
in ()
----> 1 model.save('model_wiki_2000')

/home/usr/.local/lib/python2.7/site-packages/gensim-0.8.9-py2.7.egg/gensim/utils.pyc in save(self, fname)
178 """
179 logger.info("saving %s object to %s" % (self.class.name, fname))
--> 180 pickle(self, fname)
181 #endclass SaveLoad
182

/home/usr/.local/lib/python2.7/site-packages/gensim-0.8.9-py2.7.egg/gensim/utils.pyc in pickle(obj, fname, protocol)
528 """Pickle object obj to file fname."""
529 with smart_open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
--> 530 cPickle.dump(obj, fout, protocol=protocol)
531
532

SystemError: error return without exception set

The text was updated successfully, but these errors were encountered:

piskvorky · 2014-02-01T18:42:01Z

That's a bug in Python's pickle module: numpy/numpy#2396. Not much I can do about it.

A "fix" is to overload the save/load methods so that they serialize the internal NumPy arrays in model.syn0, model.syn1 separately, into different files. (and don't store syn0norm at all.)

This is what I did in the LsiModel for example: https://github.com/piskvorky/gensim/blob/develop/gensim/models/lsimodel.py#L534

Let me know if you want to write such patch for Word2Vec class too, it's not difficult.

stochasticer · 2014-02-01T18:50:27Z

thanks! yes, will try that overloading.
btw, is this an alternative as well:
train the model via the C package, and generate the .bin file, then use Word2Vec.load_word2vec_format() to get the trained model. (hopefully, this won't take too much time, since it is a loading instead of training ?)

piskvorky · 2014-02-01T19:17:11Z

Ok, great! Let me know when the patch's ready for review.

Loading from C word2vec will work if you only want to use the model (and not continue training etc.). The C word2vec formats don't store all the necessary information.

stochasticer · 2014-02-01T22:25:13Z

thanks a lot! will post if any success. (since i am currently testing with my similarity measures, the C training + Gensim loading may also give me some fast results. i will try)

piskvorky · 2014-02-08T20:39:30Z

@stochasticer I just pushed a series of commits that allow you to save large word2vec models directly from gensim.

You can now store with model.save('/some/file', ignore=['syn0norm', 'syn1']).

Let me know if that solved your problem.

stochasticer closed this as completed Feb 1, 2014

ppotash mentioned this issue Jul 16, 2015

Saving Doc2Vec Model -- SystemError: error return without exception set #403

Closed

piskvorky mentioned this issue Aug 24, 2015

SystemError: error return without exception set #437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large dimension of the vector representation #161

large dimension of the vector representation #161

stochasticer commented Feb 1, 2014

piskvorky commented Feb 1, 2014

stochasticer commented Feb 1, 2014

piskvorky commented Feb 1, 2014

stochasticer commented Feb 1, 2014

piskvorky commented Feb 8, 2014

large dimension of the vector representation #161

large dimension of the vector representation #161

Comments

stochasticer commented Feb 1, 2014

In [9]: model.save('model_wiki_2000')

piskvorky commented Feb 1, 2014

stochasticer commented Feb 1, 2014

piskvorky commented Feb 1, 2014

stochasticer commented Feb 1, 2014

piskvorky commented Feb 8, 2014