LightGBM run error when number of dimensions is more than 90 #93

defaultRobot · 2016-11-24T02:38:36Z

Environment: windows, number of datas: 6.3w
I have found one problem: the lightgbm to train data doesn't work when number of dimensions is more than 90. However the number of dimmension is less than 70, the lightgbm can train data and predict data.
I can provide datas if needed.

guolinke · 2016-11-24T03:16:34Z

This is very strange. Can you provide data?
thanks

chivee · 2016-11-24T03:18:00Z

Hi @anddelu , that could because of memory was completely used up? could you please pasting the log here.

defaultRobot · 2016-11-24T03:56:04Z

hi, thanks for responsing:
environment：memory 8g
I have tried to reduce data less than 5M, and found it works. I thought the lightgbm doesn't work beacause it can handle too many datas.
the following datas caused the lightgbm stop
Attached is datas: the number of dimensions is more than 90
multiclass.txt

the pricture:

guolinke · 2016-11-24T03:58:54Z

@chivee I think data of 63000*90 is very small, it cannot be out of memory.

guolinke · 2016-11-24T04:01:55Z

@anddelu I try to run:
lightgbm.exe data=multiclass.txt valid=multiclass.txt objective=multiclass num_class=5
and it finish successfully.
Can you also provide your parameters?

defaultRobot · 2016-11-24T05:28:25Z

here are my parameters, based on examples, in the train.conf file:
data=multiclass.train valid_data=multiclass.test objective multiclass num_class=5
metric=multi_logloss metric_freq=1 early_stopping=10 num_trees=100
learning_rate=0.05 num_leaves=31

I used your method and find it work, howerever I use more datas to train model, it still shows error:

Attached files: training datas & train.conf
multiclass.txt
train_conf.txt

guolinke · 2016-11-24T05:37:05Z

I still can success run with your new data and config.
BTW the data in your config is not existing, So i change it to multiclass.txt both for training data and validation data.

defaultRobot · 2016-11-24T06:24:25Z

thanks for your response.
BTW: the name of config file is correct, I just changed its name when upload it because of supported format.
Unfortuantely it doesn't work for me. I thougth maybe the lightgbm.exe is the reason.
I have tried to download the new lightgbm and release it(VS 2013 release 64). It still doesn't work.
I am very confused so I want to upload it to see if you can meet it again.

I reduce number of dimensions less than 70 and found it could work again.
D:\multiclass_classification>lightgbm.exe config=train_conf.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Warning] Ignoring feature Column_38, only has one value
[LightGBM] [Info] Finished loading data in 0.260338 seconds
[LightGBM] [Info] Number of data: 62941, number of features: 69
......
[LightGBM] [Info] Early stopping at iteration 35, the best iteration round is 25
[LightGBM] [Info] 2.871995 seconds elapsed, finished iteration 35
[LightGBM] [Info] Finished training

lightgbm.exe.txt

guolinke · 2016-11-24T06:53:24Z

@anddelu i still can run with your exe...
BTW, Did you use training data as validation data? if not, can you provide your validation data as well?

guolinke · 2016-11-24T06:55:12Z

you can try my exe as well.
lightgbm.exe.txt

defaultRobot · 2016-11-24T07:14:30Z

validation data is not the traning data.
Environment: number of dimension more than 90. number of multiclass = 5
I used your lightghm, it still doesn't work by using the First datas and config file
multiclass_train.txt
multiclass_test.txt
train_conf.txt

However I use the Second data(more than first) and config file, the lightgbm of both you and I can work well. I am very really confused. The data is the cause, because the First Data is part of Second Data ?
multiclass_train1.txt
multiclass_test1.txt
train_conf1.txt

guolinke · 2016-11-24T09:01:37Z

It actually have one bug. Have fixed: 9235165

defaultRobot · 2016-11-24T15:13:32Z

thanks very much!
I have rereleased it when download fixed codes.
Now the ligthgbm can work well.
Thank you!!!

msafi04 · 2018-06-21T03:56:03Z

I have this issue. lightgbm freezes - the number of features in my dataset is more than 4000. Please help..

guolinke · 2018-06-21T04:08:17Z

@msafi04 are you on the latest LightGBM ?
can you also provide more information, like the data, hardward env and so on ?

msafi04 · 2018-06-21T04:30:12Z

Thanks for the response.
My data is of shape (4459, 4735), I am using MacBook Pro.
Is it because I am not using GPUs. do u want the code?
below is the code to train/predict my dataset.

def pred_lgbm(df, target):
target = target.astype('int')
params = {
"objective" : "regression",
"metric" : "rmse",
"num_leaves" : 30,
"learning_rate" : 0.01,
"bagging_fraction" : 0.7,
"feature_fraction" : 0.7,
"bagging_frequency" : 5,
"bagging_seed" : 2018,
"verbosity" : 3
}
Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2)
ltrain = lgbm.Dataset(Xtrain, ytrain)
lvalid = lgbm.Dataset(Xvalid, yvalid)
watchlist = [ltrain, lvalid]
print('Lgbm training..')
clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100,
verbose_eval = 10)
pred = clf.predict(Xvalid, num_iteration = clf.best_iteration)
#lgbm.plot_importance(clf)
print('RMLSE: ', rmsle(yvalid, pred))
return None

msafi04 · 2018-06-22T03:27:00Z

@guolinke I reduced the dimension to 2000 plus but still facing the issue..pls help

guolinke · 2018-06-22T03:44:10Z

It seems this is not likely to crash.
Did you build the package from latest code ?

msafi04 · 2018-06-22T03:47:12Z

I am not sure which is latest code..could you help me give the latest code. thanks

guolinke · 2018-06-22T03:52:01Z

refer to https://github.com/Microsoft/LightGBM/tree/master/python-package#install-from-github

msafi04 · 2018-06-22T04:38:18Z

thanks. can I install in anaconda?

msafi04 · 2018-06-22T07:43:14Z

I updated to 2.1.1 but still it crashes..

guolinke · 2018-06-22T07:51:31Z

I just update it to 2.1.2, can you try on it ?
It will be better if you can provide a re-produce code with random generated data.

msafi04 · 2018-06-25T03:37:19Z

@guolinke Pls check my code below. thanks

import pandas as pd
import numpy as np
import lightgbm as lgbm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold

def pred_lgbm(df, target):
target = target.astype('int')
params = {
"boosting_type" : "gbdt",
"objective" : "regression",
"metric" : "rmse",
"num_leaves" : 30,
"learning_rate" : 0.01,
"bagging_fraction" : 0.7,
"feature_fraction" : 0.7,
"bagging_frequency" : 5,
"verbosity" : 3
}
Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2)
ltrain = lgbm.Dataset(Xtrain, ytrain)
lvalid = lgbm.Dataset(Xvalid, yvalid)
watchlist = [ltrain, lvalid]
print('Lgbm training..')
clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100,
verbose_eval = 10)
pred = clf.predict(Xvalid, num_iteration = clf.best_iteration)
lgbm.plot_importance(clf)
return None

def main():
input_file = r'train.csv.zip'
df = pd.read_csv(input_file)
df.drop('ID', axis = 1, inplace = True)
print(df.shape)
target = df['target'].copy()
df.drop('target', axis = 1, inplace = True)
scl = StandardScaler()
df = scl.fit_transform(df)
print(df.shape, target.shape)
print('Scaling done..')
varThres = VarianceThreshold(threshold = 0.5)
df = varThres.fit_transform(df)
print('Variance Thershold done..')
print(df.shape, target.shape)
pred_lgbm(df, target)

if name == 'main':
main()

StrikerRUS · 2018-06-25T12:11:52Z

@msafi04 Seems the code and data are OK - just run your snippet:

(4459, 4992)
(4459, 4991) (4459,)
Scaling done..
Variance Thershold done..
(4459, 4735) (4459,)
Lgbm training..
Training until validation scores don't improve for 100 rounds.
[10]	training's rmse: 8.05324e+06	valid_1's rmse: 7.85907e+06
[20]	training's rmse: 7.83593e+06	valid_1's rmse: 7.75118e+06
[30]	training's rmse: 7.6388e+06	valid_1's rmse: 7.65281e+06
[40]	training's rmse: 7.46258e+06	valid_1's rmse: 7.57028e+06
[50]	training's rmse: 7.29981e+06	valid_1's rmse: 7.49248e+06
[60]	training's rmse: 7.15265e+06	valid_1's rmse: 7.4294e+06
[70]	training's rmse: 7.01553e+06	valid_1's rmse: 7.37953e+06
[80]	training's rmse: 6.88825e+06	valid_1's rmse: 7.33082e+06
[90]	training's rmse: 6.77233e+06	valid_1's rmse: 7.28675e+06
[100]	training's rmse: 6.66424e+06	valid_1's rmse: 7.25186e+06
[110]	training's rmse: 6.56176e+06	valid_1's rmse: 7.21713e+06
[120]	training's rmse: 6.46828e+06	valid_1's rmse: 7.18686e+06
[130]	training's rmse: 6.37945e+06	valid_1's rmse: 7.1649e+06
[140]	training's rmse: 6.29867e+06	valid_1's rmse: 7.14595e+06
[150]	training's rmse: 6.22141e+06	valid_1's rmse: 7.12732e+06
[160]	training's rmse: 6.14847e+06	valid_1's rmse: 7.11351e+06
[170]	training's rmse: 6.08012e+06	valid_1's rmse: 7.10631e+06
[180]	training's rmse: 6.01555e+06	valid_1's rmse: 7.09486e+06
[190]	training's rmse: 5.95376e+06	valid_1's rmse: 7.08501e+06
[200]	training's rmse: 5.89536e+06	valid_1's rmse: 7.08337e+06
[210]	training's rmse: 5.83995e+06	valid_1's rmse: 7.07864e+06
[220]	training's rmse: 5.7867e+06	valid_1's rmse: 7.07283e+06
[230]	training's rmse: 5.73427e+06	valid_1's rmse: 7.06722e+06
[240]	training's rmse: 5.68461e+06	valid_1's rmse: 7.06331e+06
[250]	training's rmse: 5.63637e+06	valid_1's rmse: 7.05935e+06
[260]	training's rmse: 5.59054e+06	valid_1's rmse: 7.05584e+06
[270]	training's rmse: 5.54617e+06	valid_1's rmse: 7.04874e+06
[280]	training's rmse: 5.50349e+06	valid_1's rmse: 7.04536e+06
[290]	training's rmse: 5.46137e+06	valid_1's rmse: 7.0422e+06
[300]	training's rmse: 5.41947e+06	valid_1's rmse: 7.03769e+06
[310]	training's rmse: 5.3805e+06	valid_1's rmse: 7.03732e+06
[320]	training's rmse: 5.34281e+06	valid_1's rmse: 7.03467e+06
[330]	training's rmse: 5.30545e+06	valid_1's rmse: 7.0324e+06
[340]	training's rmse: 5.268e+06	valid_1's rmse: 7.0315e+06
[350]	training's rmse: 5.23303e+06	valid_1's rmse: 7.03043e+06
[360]	training's rmse: 5.19829e+06	valid_1's rmse: 7.03139e+06
[370]	training's rmse: 5.1656e+06	valid_1's rmse: 7.03016e+06
[380]	training's rmse: 5.13263e+06	valid_1's rmse: 7.02977e+06
[390]	training's rmse: 5.10139e+06	valid_1's rmse: 7.02994e+06
[400]	training's rmse: 5.0704e+06	valid_1's rmse: 7.02894e+06
[410]	training's rmse: 5.0401e+06	valid_1's rmse: 7.02555e+06
[420]	training's rmse: 5.01039e+06	valid_1's rmse: 7.0228e+06
[430]	training's rmse: 4.98113e+06	valid_1's rmse: 7.02337e+06
[440]	training's rmse: 4.95388e+06	valid_1's rmse: 7.02124e+06
[450]	training's rmse: 4.92627e+06	valid_1's rmse: 7.02215e+06
[460]	training's rmse: 4.89821e+06	valid_1's rmse: 7.0211e+06
[470]	training's rmse: 4.87228e+06	valid_1's rmse: 7.02058e+06
[480]	training's rmse: 4.8454e+06	valid_1's rmse: 7.0215e+06
[490]	training's rmse: 4.82091e+06	valid_1's rmse: 7.02276e+06
[500]	training's rmse: 4.79609e+06	valid_1's rmse: 7.02203e+06
[510]	training's rmse: 4.77164e+06	valid_1's rmse: 7.02348e+06
[520]	training's rmse: 4.74809e+06	valid_1's rmse: 7.02513e+06
[530]	training's rmse: 4.72451e+06	valid_1's rmse: 7.02571e+06
[540]	training's rmse: 4.7003e+06	valid_1's rmse: 7.02951e+06
[550]	training's rmse: 4.67647e+06	valid_1's rmse: 7.03081e+06
[560]	training's rmse: 4.65358e+06	valid_1's rmse: 7.03147e+06
Early stopping, best iteration is:
[469]	training's rmse: 4.87491e+06	valid_1's rmse: 7.02009e+06

Not related to the issue, but there is no bagging_frequency parameter in LightGBM, only bagging_freq.

msafi04 · 2018-06-26T03:46:38Z

@StrikerRUS Thanks for the response. I ran the code but facing same issue. It appears my kernel is dead and restarts. pls check the screen shot

guolinke · 2018-06-26T04:23:44Z

can you try it wothout jupyter?

msafi04 · 2018-06-26T06:21:58Z

@guolinke I got the below error.
MacBook-Pro:~ msafi04$ python first.py
(4459, 4992)
(4459, 4991) (4459,)
Scaling done..
Variance Thershold done..
(4459, 4735) (4459,)
Lgbm training..
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Abort trap: 6

guolinke · 2018-06-26T06:52:15Z

It seems your env has some issues with openmp.
you can re-install gcc8 and try again.

msafi04 · 2018-06-26T07:47:36Z

I installed gcc@8 but same error as above..OMP: Error

guolinke · 2018-06-26T08:26:56Z

@msafi04 refer to dmlc/xgboost#1715

guolinke · 2018-06-26T08:30:26Z

@msafi04
can you try:

brew uninstall libiomp clang-omp gcc
brew install gcc@8

if you has other gcc packages, please uninstall them as well.

msafi04 · 2018-06-27T03:22:38Z

@guolinke I get this error now after your suggestion.

ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so
Reason: image not found

msafi04 · 2018-06-27T03:50:55Z

Traceback (most recent call last):
File "first.py", line 4, in
from sklearn.model_selection import train_test_split
File "/anaconda3/lib/python3.6/site-packages/sklearn/init.py", line 134, in
from .base import clone
File "/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 13, in
from .utils.fixes import signature
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/init.py", line 11, in
from .validation import (as_float_array,
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 18, in
from ..utils.fixes import signature
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/fixes.py", line 144, in
from scipy.sparse.linalg import lsqr as sparse_lsqr # noqa
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/init.py", line 117, in
from .eigen import *
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/init.py", line 11, in
from .arpack import *
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/init.py", line 22, in
from .arpack import *
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 45, in
from . import _arpack
ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so
Reason: image not found

guolinke · 2018-06-27T04:10:53Z

it seems you need to reinstall scipy numpy sklearn

msafi04 · 2018-06-27T04:23:35Z

reinstalling didn't help. still facing the same error. pls help me.

guolinke · 2018-06-27T04:35:15Z

you can try to uninstall the anaconda, then re-install.

msafi04 · 2018-06-27T05:30:33Z

@guolinke It worked!! Thanks for your patience and help.

guolinke closed this as completed Nov 24, 2016

lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020

LightGBM run error when number of dimensions is more than 90 #93

LightGBM run error when number of dimensions is more than 90 #93

Comments

defaultRobot commented Nov 24, 2016 • edited

guolinke commented Nov 24, 2016

chivee commented Nov 24, 2016

defaultRobot commented Nov 24, 2016

guolinke commented Nov 24, 2016

guolinke commented Nov 24, 2016 • edited

defaultRobot commented Nov 24, 2016

guolinke commented Nov 24, 2016

defaultRobot commented Nov 24, 2016 • edited

guolinke commented Nov 24, 2016

guolinke commented Nov 24, 2016

defaultRobot commented Nov 24, 2016 • edited

guolinke commented Nov 24, 2016

defaultRobot commented Nov 24, 2016

msafi04 commented Jun 21, 2018

guolinke commented Jun 21, 2018

msafi04 commented Jun 21, 2018

msafi04 commented Jun 22, 2018

guolinke commented Jun 22, 2018

msafi04 commented Jun 22, 2018

guolinke commented Jun 22, 2018

msafi04 commented Jun 22, 2018

msafi04 commented Jun 22, 2018

guolinke commented Jun 22, 2018

msafi04 commented Jun 25, 2018 • edited

StrikerRUS commented Jun 25, 2018

msafi04 commented Jun 26, 2018

guolinke commented Jun 26, 2018

msafi04 commented Jun 26, 2018

guolinke commented Jun 26, 2018

msafi04 commented Jun 26, 2018

guolinke commented Jun 26, 2018

guolinke commented Jun 26, 2018

msafi04 commented Jun 27, 2018

msafi04 commented Jun 27, 2018

guolinke commented Jun 27, 2018

msafi04 commented Jun 27, 2018

guolinke commented Jun 27, 2018

msafi04 commented Jun 27, 2018

defaultRobot commented Nov 24, 2016 •

edited

guolinke commented Nov 24, 2016 •

edited

defaultRobot commented Nov 24, 2016 •

edited

defaultRobot commented Nov 24, 2016 •

edited

msafi04 commented Jun 25, 2018 •

edited