Training Issue #10

visualizeMath · 2016-04-25T08:44:11Z

Hi. First of all thank you very much for your help. You have saved my life at least several times :) My question is that I have experinced some problems while training word2vec with large data corpus. The data i'd like to use for training process is almost 4 Gb. I wonder whether if it's possible or not. I tried to train word2vec with 2 Gb data and it didn't work too.Shall i increase the heap-size or something like that ?

eabdullin · 2016-04-26T15:21:07Z

Can you share your training data? I'll try to train vectors :)

CaCTuCaTu4ECKuu · 2016-06-30T22:31:15Z

I find out where is this issue and #1
I use some 100mb internet data and it was surprise that there is exception, but ther i understand that when I do StreamReader.ReadLine() I read a whole file which is storing with only spaces and thats cause an exception. And actually I dont even sure what to do to save same performance, because there is threads and seek, but you cant just seek through single line so

CaCTuCaTu4ECKuu · 2016-07-01T09:13:01Z

I solve this by preprocessing train file and separating some amount of words in single line because solid line cause issues even when opening with notepad++ when opening processed files occurs instantly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Issue #10

Training Issue #10

visualizeMath commented Apr 25, 2016

eabdullin commented Apr 26, 2016

CaCTuCaTu4ECKuu commented Jun 30, 2016 •

edited

CaCTuCaTu4ECKuu commented Jul 1, 2016

Training Issue #10

Training Issue #10

Comments

visualizeMath commented Apr 25, 2016

eabdullin commented Apr 26, 2016

CaCTuCaTu4ECKuu commented Jun 30, 2016 • edited

CaCTuCaTu4ECKuu commented Jul 1, 2016

CaCTuCaTu4ECKuu commented Jun 30, 2016 •

edited