Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM run error when number of dimensions is more than 90 #93

Closed
defaultRobot opened this issue Nov 24, 2016 · 38 comments
Closed

LightGBM run error when number of dimensions is more than 90 #93

defaultRobot opened this issue Nov 24, 2016 · 38 comments

Comments

@defaultRobot
Copy link

defaultRobot commented Nov 24, 2016

Environment: windows, number of datas: 6.3w
I have found one problem: the lightgbm to train data doesn't work when number of dimensions is more than 90. However the number of dimmension is less than 70, the lightgbm can train data and predict data.
I can provide datas if needed.

@guolinke
Copy link
Collaborator

This is very strange. Can you provide data?
thanks

@chivee
Copy link
Collaborator

chivee commented Nov 24, 2016

Hi @anddelu , that could because of memory was completely used up? could you please pasting the log here.

@defaultRobot
Copy link
Author

hi, thanks for responsing:
environment:memory 8g
I have tried to reduce data less than 5M, and found it works. I thought the lightgbm doesn't work beacause it can handle too many datas.
the following datas caused the lightgbm stop
Attached is datas: the number of dimensions is more than 90
multiclass.txt

the pricture:
image

@guolinke
Copy link
Collaborator

@chivee I think data of 63000*90 is very small, it cannot be out of memory.

@guolinke
Copy link
Collaborator

guolinke commented Nov 24, 2016

@anddelu I try to run:
lightgbm.exe data=multiclass.txt valid=multiclass.txt objective=multiclass num_class=5
and it finish successfully.
Can you also provide your parameters?

@defaultRobot
Copy link
Author

here are my parameters, based on examples, in the train.conf file:
data=multiclass.train valid_data=multiclass.test objective multiclass num_class=5
metric=multi_logloss metric_freq=1 early_stopping=10 num_trees=100
learning_rate=0.05 num_leaves=31

I used your method and find it work, howerever I use more datas to train model, it still shows error:
err01

Attached files: training datas & train.conf
multiclass.txt
train_conf.txt

@guolinke
Copy link
Collaborator

image

I still can success run with your new data and config.
BTW the data in your config is not existing, So i change it to multiclass.txt both for training data and validation data.

@defaultRobot
Copy link
Author

defaultRobot commented Nov 24, 2016

thanks for your response.
BTW: the name of config file is correct, I just changed its name when upload it because of supported format.
Unfortuantely it doesn't work for me. I thougth maybe the lightgbm.exe is the reason.
I have tried to download the new lightgbm and release it(VS 2013 release 64). It still doesn't work.
I am very confused so I want to upload it to see if you can meet it again.

I reduce number of dimensions less than 70 and found it could work again.
D:\multiclass_classification>lightgbm.exe config=train_conf.txt
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Warning] Ignoring feature Column_38, only has one value
[LightGBM] [Info] Finished loading data in 0.260338 seconds
[LightGBM] [Info] Number of data: 62941, number of features: 69
......
[LightGBM] [Info] Early stopping at iteration 35, the best iteration round is 25
[LightGBM] [Info] 2.871995 seconds elapsed, finished iteration 35
[LightGBM] [Info] Finished training

lightgbm.exe.txt

@guolinke
Copy link
Collaborator

@anddelu i still can run with your exe...
BTW, Did you use training data as validation data? if not, can you provide your validation data as well?

@guolinke
Copy link
Collaborator

you can try my exe as well.
lightgbm.exe.txt

@defaultRobot
Copy link
Author

defaultRobot commented Nov 24, 2016

validation data is not the traning data.
Environment: number of dimension more than 90. number of multiclass = 5
I used your lightghm, it still doesn't work by using the First datas and config file
multiclass_train.txt
multiclass_test.txt
train_conf.txt

However I use the Second data(more than first) and config file, the lightgbm of both you and I can work well. I am very really confused. The data is the cause, because the First Data is part of Second Data ?
multiclass_train1.txt
multiclass_test1.txt
train_conf1.txt

@guolinke
Copy link
Collaborator

It actually have one bug. Have fixed: 9235165

@defaultRobot
Copy link
Author

thanks very much!
I have rereleased it when download fixed codes.
Now the ligthgbm can work well.
Thank you!!!

@msafi04
Copy link

msafi04 commented Jun 21, 2018

I have this issue. lightgbm freezes - the number of features in my dataset is more than 4000. Please help..

@guolinke
Copy link
Collaborator

@msafi04 are you on the latest LightGBM ?
can you also provide more information, like the data, hardward env and so on ?

@msafi04
Copy link

msafi04 commented Jun 21, 2018

Thanks for the response.
My data is of shape (4459, 4735), I am using MacBook Pro.
Is it because I am not using GPUs. do u want the code?
below is the code to train/predict my dataset.

def pred_lgbm(df, target):
target = target.astype('int')
params = {
"objective" : "regression",
"metric" : "rmse",
"num_leaves" : 30,
"learning_rate" : 0.01,
"bagging_fraction" : 0.7,
"feature_fraction" : 0.7,
"bagging_frequency" : 5,
"bagging_seed" : 2018,
"verbosity" : 3
}
Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2)
ltrain = lgbm.Dataset(Xtrain, ytrain)
lvalid = lgbm.Dataset(Xvalid, yvalid)
watchlist = [ltrain, lvalid]
print('Lgbm training..')
clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100,
verbose_eval = 10)
pred = clf.predict(Xvalid, num_iteration = clf.best_iteration)
#lgbm.plot_importance(clf)
print('RMLSE: ', rmsle(yvalid, pred))
return None

@msafi04
Copy link

msafi04 commented Jun 22, 2018

@guolinke I reduced the dimension to 2000 plus but still facing the issue..pls help

@guolinke
Copy link
Collaborator

It seems this is not likely to crash.
Did you build the package from latest code ?

@msafi04
Copy link

msafi04 commented Jun 22, 2018

I am not sure which is latest code..could you help me give the latest code. thanks

@guolinke
Copy link
Collaborator

@msafi04
Copy link

msafi04 commented Jun 22, 2018

thanks. can I install in anaconda?

@msafi04
Copy link

msafi04 commented Jun 22, 2018

I updated to 2.1.1 but still it crashes..

@guolinke
Copy link
Collaborator

I just update it to 2.1.2, can you try on it ?
It will be better if you can provide a re-produce code with random generated data.

@msafi04
Copy link

msafi04 commented Jun 25, 2018

@guolinke Pls check my code below. thanks

import pandas as pd
import numpy as np
import lightgbm as lgbm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold

def pred_lgbm(df, target):
target = target.astype('int')
params = {
"boosting_type" : "gbdt",
"objective" : "regression",
"metric" : "rmse",
"num_leaves" : 30,
"learning_rate" : 0.01,
"bagging_fraction" : 0.7,
"feature_fraction" : 0.7,
"bagging_frequency" : 5,
"verbosity" : 3
}
Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2)
ltrain = lgbm.Dataset(Xtrain, ytrain)
lvalid = lgbm.Dataset(Xvalid, yvalid)
watchlist = [ltrain, lvalid]
print('Lgbm training..')
clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100,
verbose_eval = 10)
pred = clf.predict(Xvalid, num_iteration = clf.best_iteration)
lgbm.plot_importance(clf)
return None

def main():
input_file = r'train.csv.zip'
df = pd.read_csv(input_file)
df.drop('ID', axis = 1, inplace = True)
print(df.shape)
target = df['target'].copy()
df.drop('target', axis = 1, inplace = True)
scl = StandardScaler()
df = scl.fit_transform(df)
print(df.shape, target.shape)
print('Scaling done..')
varThres = VarianceThreshold(threshold = 0.5)
df = varThres.fit_transform(df)
print('Variance Thershold done..')
print(df.shape, target.shape)
pred_lgbm(df, target)

if name == 'main':
main()

@StrikerRUS
Copy link
Collaborator

@msafi04 Seems the code and data are OK - just run your snippet:

(4459, 4992)
(4459, 4991) (4459,)
Scaling done..
Variance Thershold done..
(4459, 4735) (4459,)
Lgbm training..
Training until validation scores don't improve for 100 rounds.
[10]	training's rmse: 8.05324e+06	valid_1's rmse: 7.85907e+06
[20]	training's rmse: 7.83593e+06	valid_1's rmse: 7.75118e+06
[30]	training's rmse: 7.6388e+06	valid_1's rmse: 7.65281e+06
[40]	training's rmse: 7.46258e+06	valid_1's rmse: 7.57028e+06
[50]	training's rmse: 7.29981e+06	valid_1's rmse: 7.49248e+06
[60]	training's rmse: 7.15265e+06	valid_1's rmse: 7.4294e+06
[70]	training's rmse: 7.01553e+06	valid_1's rmse: 7.37953e+06
[80]	training's rmse: 6.88825e+06	valid_1's rmse: 7.33082e+06
[90]	training's rmse: 6.77233e+06	valid_1's rmse: 7.28675e+06
[100]	training's rmse: 6.66424e+06	valid_1's rmse: 7.25186e+06
[110]	training's rmse: 6.56176e+06	valid_1's rmse: 7.21713e+06
[120]	training's rmse: 6.46828e+06	valid_1's rmse: 7.18686e+06
[130]	training's rmse: 6.37945e+06	valid_1's rmse: 7.1649e+06
[140]	training's rmse: 6.29867e+06	valid_1's rmse: 7.14595e+06
[150]	training's rmse: 6.22141e+06	valid_1's rmse: 7.12732e+06
[160]	training's rmse: 6.14847e+06	valid_1's rmse: 7.11351e+06
[170]	training's rmse: 6.08012e+06	valid_1's rmse: 7.10631e+06
[180]	training's rmse: 6.01555e+06	valid_1's rmse: 7.09486e+06
[190]	training's rmse: 5.95376e+06	valid_1's rmse: 7.08501e+06
[200]	training's rmse: 5.89536e+06	valid_1's rmse: 7.08337e+06
[210]	training's rmse: 5.83995e+06	valid_1's rmse: 7.07864e+06
[220]	training's rmse: 5.7867e+06	valid_1's rmse: 7.07283e+06
[230]	training's rmse: 5.73427e+06	valid_1's rmse: 7.06722e+06
[240]	training's rmse: 5.68461e+06	valid_1's rmse: 7.06331e+06
[250]	training's rmse: 5.63637e+06	valid_1's rmse: 7.05935e+06
[260]	training's rmse: 5.59054e+06	valid_1's rmse: 7.05584e+06
[270]	training's rmse: 5.54617e+06	valid_1's rmse: 7.04874e+06
[280]	training's rmse: 5.50349e+06	valid_1's rmse: 7.04536e+06
[290]	training's rmse: 5.46137e+06	valid_1's rmse: 7.0422e+06
[300]	training's rmse: 5.41947e+06	valid_1's rmse: 7.03769e+06
[310]	training's rmse: 5.3805e+06	valid_1's rmse: 7.03732e+06
[320]	training's rmse: 5.34281e+06	valid_1's rmse: 7.03467e+06
[330]	training's rmse: 5.30545e+06	valid_1's rmse: 7.0324e+06
[340]	training's rmse: 5.268e+06	valid_1's rmse: 7.0315e+06
[350]	training's rmse: 5.23303e+06	valid_1's rmse: 7.03043e+06
[360]	training's rmse: 5.19829e+06	valid_1's rmse: 7.03139e+06
[370]	training's rmse: 5.1656e+06	valid_1's rmse: 7.03016e+06
[380]	training's rmse: 5.13263e+06	valid_1's rmse: 7.02977e+06
[390]	training's rmse: 5.10139e+06	valid_1's rmse: 7.02994e+06
[400]	training's rmse: 5.0704e+06	valid_1's rmse: 7.02894e+06
[410]	training's rmse: 5.0401e+06	valid_1's rmse: 7.02555e+06
[420]	training's rmse: 5.01039e+06	valid_1's rmse: 7.0228e+06
[430]	training's rmse: 4.98113e+06	valid_1's rmse: 7.02337e+06
[440]	training's rmse: 4.95388e+06	valid_1's rmse: 7.02124e+06
[450]	training's rmse: 4.92627e+06	valid_1's rmse: 7.02215e+06
[460]	training's rmse: 4.89821e+06	valid_1's rmse: 7.0211e+06
[470]	training's rmse: 4.87228e+06	valid_1's rmse: 7.02058e+06
[480]	training's rmse: 4.8454e+06	valid_1's rmse: 7.0215e+06
[490]	training's rmse: 4.82091e+06	valid_1's rmse: 7.02276e+06
[500]	training's rmse: 4.79609e+06	valid_1's rmse: 7.02203e+06
[510]	training's rmse: 4.77164e+06	valid_1's rmse: 7.02348e+06
[520]	training's rmse: 4.74809e+06	valid_1's rmse: 7.02513e+06
[530]	training's rmse: 4.72451e+06	valid_1's rmse: 7.02571e+06
[540]	training's rmse: 4.7003e+06	valid_1's rmse: 7.02951e+06
[550]	training's rmse: 4.67647e+06	valid_1's rmse: 7.03081e+06
[560]	training's rmse: 4.65358e+06	valid_1's rmse: 7.03147e+06
Early stopping, best iteration is:
[469]	training's rmse: 4.87491e+06	valid_1's rmse: 7.02009e+06

Not related to the issue, but there is no bagging_frequency parameter in LightGBM, only bagging_freq.

@msafi04
Copy link

msafi04 commented Jun 26, 2018

@StrikerRUS Thanks for the response. I ran the code but facing same issue. It appears my kernel is dead and restarts. pls check the screen shot
screen shot 2018-06-26 at 11 44 00 am

@guolinke
Copy link
Collaborator

can you try it wothout jupyter?

@msafi04
Copy link

msafi04 commented Jun 26, 2018

@guolinke I got the below error.
MacBook-Pro:~ msafi04$ python first.py
(4459, 4992)
(4459, 4991) (4459,)
Scaling done..
Variance Thershold done..
(4459, 4735) (4459,)
Lgbm training..
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Abort trap: 6

@guolinke
Copy link
Collaborator

It seems your env has some issues with openmp.
you can re-install gcc8 and try again.

@msafi04
Copy link

msafi04 commented Jun 26, 2018

I installed gcc@8 but same error as above..OMP: Error

@guolinke
Copy link
Collaborator

@msafi04 refer to dmlc/xgboost#1715

@guolinke
Copy link
Collaborator

@msafi04
can you try:

brew uninstall libiomp clang-omp gcc
brew install gcc@8

if you has other gcc packages, please uninstall them as well.

@msafi04
Copy link

msafi04 commented Jun 27, 2018

@guolinke I get this error now after your suggestion.

ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so
Reason: image not found

@msafi04
Copy link

msafi04 commented Jun 27, 2018

Traceback (most recent call last):
File "first.py", line 4, in
from sklearn.model_selection import train_test_split
File "/anaconda3/lib/python3.6/site-packages/sklearn/init.py", line 134, in
from .base import clone
File "/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 13, in
from .utils.fixes import signature
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/init.py", line 11, in
from .validation import (as_float_array,
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 18, in
from ..utils.fixes import signature
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/fixes.py", line 144, in
from scipy.sparse.linalg import lsqr as sparse_lsqr # noqa
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/init.py", line 117, in
from .eigen import *
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/init.py", line 11, in
from .arpack import *
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/init.py", line 22, in
from .arpack import *
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 45, in
from . import _arpack
ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib
Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so
Reason: image not found

@guolinke
Copy link
Collaborator

it seems you need to reinstall scipy numpy sklearn

@msafi04
Copy link

msafi04 commented Jun 27, 2018

reinstalling didn't help. still facing the same error. pls help me.

@guolinke
Copy link
Collaborator

you can try to uninstall the anaconda, then re-install.

@msafi04
Copy link

msafi04 commented Jun 27, 2018

@guolinke It worked!! Thanks for your patience and help.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants