Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the difference of predict result between using xgboost-predictor and using python #32

Open
fenxouxiaoquan opened this issue Apr 20, 2018 · 4 comments

Comments

@fenxouxiaoquan
Copy link

Hi,
i train xgboost model with python and save it as 001.model,for test, i use this 001.model to predict one sample in python, get 0.223,but get 0.0604 with using xgboost-predictor,same model,same sample,but different predicted result,and i test another sample, still get different result,only the same thing is using python to predict can get larger score. i have no idea to deal with this problem.note that,my input is a hashmap,like this:{33 : 1.0,34 : 1.0,125 : 0.04261,185 : 0.01504}

@fenxouxiaoquan
Copy link
Author

i try once more,seems the problem is gone,i just change double format to float format in the input hashmap

@hlbkin
Copy link

hlbkin commented Apr 30, 2018

@fenxouxiaoquan
Hello, I've actually found the same with standard xgboost4j.

However, when I tryied this library, I got practically the same result as in python (I inserted float[] array)

In xgboost4j I was inserting DMatrix(float[] mydata) as well

How can this be possible?

@fenxouxiaoquan
Copy link
Author

fenxouxiaoquan commented Jul 18, 2018

@hlbkin
Hello, I didn't get it in fact. Dou you mean that you get the same predict result when you test it in xgboost-predictor and python? I get different predict result again today( 9.189394587749626E-5 for xgboost-predictor and 0.605 for python),just like about three months ago what i experienced, this time my input still a hashmap with nonezero values,do you have any ider to deal with this?

@cpfarrell
Copy link

I would recommend to always be using features as floats. XGBoost is explicit that it treats things as 32 bit due to performance optimizations (one example dmlc/xgboost#1410). If a model has been trained using xgboost its split values will be stored as floats and so giving it doubles may cause inaccurate predictions if hit just the right values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants