Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgboost output not matching #6

Open
kumarsameer opened this issue Jan 31, 2023 · 10 comments
Open

xgboost output not matching #6

kumarsameer opened this issue Jan 31, 2023 · 10 comments

Comments

@kumarsameer
Copy link

if True:
    filename = "ind/test/" + 'bnifty'+'_'+str(today)+".ind.ade"
    adf = pd.read_csv(filename,index_col=False)
    tdf = createTrainDf(adf)
    dtrain = xgb.DMatrix(data=tdf[train_cols])
    y_pred = model.predict(dtrain,output_margin=True)
    print("live :",y_pred)

if True:
    model.save_model("model.bin")
    bst = xgb.Booster()
    bst.load_model('model.bin')
    print("bin :",bst.predict(dtrain,output_margin=True))

if True:
    from treelite_runtime import Predictor,DMatrix
    model_path="./tlite2"
    bst_lite = treelite.Model.from_xgboost(model)
    bst_lite.export_lib(toolchain='gcc',params = {},libpath=model_path, verbose=False)
    predictor = Predictor(model_path, verbose=False)
    dmat = DMatrix(tdf[train_cols])
    print("treelite :",predictor.predict(dmat,pred_margin=True))

output:

live : [-1.0892507  -0.2221069  -0.28209284  0.01806552  0.16996847 -0.9235258 ]
bin : [-1.0892507  -0.2221069  -0.28209284  0.01806552  0.16996847 -0.9235258 ]
treelite : [-0.08823351 -0.2221069  -0.3661577   0.01806552  0.30945933 -0.9235258 ]

Every alternate margin output it matching.

@kumarsameer
Copy link
Author

Also compiled xgboost c-api. The margin output matches correctly.
Have verified the column names in xgboost dump against treelite's main.c file, and the column names matches correctly.

Any idea what could be the issue is greatly appreciated.

@kumarsameer
Copy link
Author

@hcho3 I have looked at similar issue you commented on, and tried everything.
Possible for you to throw some light here.

Am using the following package versions :

treelite==3.1.0.dev0
treelite-runtime==3.1.0
xgboost==1.7.3

@hcho3
Copy link
Collaborator

hcho3 commented Feb 8, 2023

@kumarsameer Can you post your XGBoost model here? I'll try to debug the issue

@kumarsameer
Copy link
Author

please find the replication example and modelfile(zip) attached.

import xgboost as xgb
import treelite
import treelite_runtime
import numpy as np

test_data = [-1.62e+02,  3.63e+01,  1.00e+01,  6.00e+00,  1.10e+01,  7.00e+00,
         1.00e+00,  3.00e+00,  0.00e+00,  2.00e+00, -1.00e+00,  3.30e+01,
         7.70e+01,  3.90e+01,  1.11e+01,  1.30e+01, -9.00e+00,  0.00e+00,
        -6.80e+01,  5.40e+01,  1.09e+02, -7.00e-01, -7.00e-01, -3.00e+00,
         9.10e+01, -9.00e+00, -2.00e+00,  1.10e+01,  9.00e+00,  1.50e+01,
        -1.20e+01, -1.80e+01, -6.20e+01,  6.55e+01,  4.50e+01,  5.90e+01,
         1.00e-01, -2.00e+00,  3.00e+00,  1.80e+01, -6.00e+00, -1.80e+01,
        -1.00e+00,  1.00e-01, -2.40e+00,  0.00e+00,  0.00e+00,  0.00e+00,
         0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,
         0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,
         0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00,  0.00e+00]


modelfile = 'model9.bin'
model = treelite.Model.load(modelfile, model_format='xgboost')
model.export_lib('gcc', 'compiled.dylib', params={'parallel_comp': model.num_tree}, verbose=False)
predictor = treelite_runtime.Predictor(libpath='./compiled.dylib')
dmat = treelite_runtime.DMatrix(test_data)
print(f'Treelite: {predictor.predict(dmat,pred_margin=True)}')

bst = xgb.Booster()
bst.load_model('model9.bin')
dtrain = xgb.DMatrix(data=np.expand_dims(test_data,0))
print("bin :",bst.predict(dtrain,output_margin=True))

This is the output I get on my machine

Treelite: -0.3277367353439331
bin : [0.5290272]

model.zip

@kumarsameer
Copy link
Author

The output matches if
test_data=list(np.ones(65))

@kumarsameer
Copy link
Author

@hcho3 let me know if you need any help in replicating the issue ?

@kumarsameer
Copy link
Author

Not sure if it helps, but one of the things i noticed is that the c++ xgboost api also matched only after i set the missing to std::numeric_limits<double>::quiet_NaN()

@mchonofsky
Copy link

mchonofsky commented Feb 20, 2023

Hi y'all,

Seeing the same issue here. Attaching a CSV of data and labels. The predictions for the first twenty rows are below - XGB at left, Treelite at right. I'm running Treelite version 3.1.0. I attached the XGBoost JSON and generated C code in the zip file.

models.zip
train_labels.csv
train_data.csv

0.0 0.15592413
0.0 0.7200409
0.0 0.7200409
0.0 0.9868025
1.0 0.9695979
0.0 0.9695979
0.0 0.90683216
1.0 0.90683216
1.0 0.25082284
1.0 0.6561131
1.0 0.6561131
1.0 0.33845404
1.0 0.33845404
1.0 0.33845404
1.0 0.7200409
1.0 0.0045186426

@mchonofsky
Copy link

The model was built in XGBoost with

param = {'max_depth': 6,
 'eta': 0.3,
 'tree_method': 'hist',
 'objective': 'binary:hinge',
 'eval_metric': ['logloss', 'error']}

@mchonofsky
Copy link

And here's full replication code

import pandas as pd, numpy as np, treelite, treelite_runtime, xgboost as xgb
from importlib import reload
reload(treelite_runtime)
reload(treelite)
X = pd.read_csv('train_data.csv').to_numpy()[:,1:]
y = pd.read_csv('train_labels.csv').to_numpy()[:,1]
dtrain = xgb.DMatrix(X, label=y)
param = {'max_depth': 6, 'eta': 0.3, 'tree_method': 'hist', 'objective': 'binary:hinge', 'eval_metric':['logloss', 'error']}
bst = xgb.train(param, dtrain, 10, [(dtrain, 'train')])
model = treelite.Model.from_xgboost(bst)
model.export_lib(toolchain='gcc', libpath='./mymodel.so', verbose=True)
preds = bst.predict(xgb.DMatrix(X[0:20,:]))
predictor = treelite_runtime.Predictor('./mymodel.so')
# these should match
for i in range(20): print(preds[i], predictor.predict(treelite_runtime.DMatrix(X[i:i+1,:])))

@hcho3 hcho3 transferred this issue from dmlc/treelite May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants