You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I trained a binary classification model with Sagemaker's container for xgboost 1.7-1.
I also have previously developed a xgboost model for the same dataset locally.
The positive rate for the dataset is generally < 4%, very low occurrence.
When I compared the predicted probabilities from the sagemaker builtin model and my local model, the results are opposite.
Given the low positive rate I believe the sagemaker model outputs are incorrect.
See images.
I checked the inputs for training and they are identical, except that on my local machine I fed the model csv file whereas for sagemaker xgboost it required the data in libsvm format. But after double checking the training data were the same. I also fed the same hyperparameter.
To reproduce
For sagemaker:
from sagemaker.xgboost.estimator import XGBoost
# version 1:
xgb_script_mode_estimator = XGBoost(
entry_point=script_path,
framework_version="1.7-1",
# hyperparameters=hyperparameters,
role=role,
instance_count=2,
instance_type=instance_type,
output_path=output_path,
code_location=output_path
)
# calling fit
# version 2:
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, "xgboost", "1.7-1")
xgb = sagemaker.estimator.Estimator(
container,
role,
instance_count=1,
instance_type="ml.m4.xlarge",
output_path="s3://{}/{}/output".format(s3_bucket, key, "no-show-xgb"),
sagemaker_session=sess,
)
# calling fit
Is there any way to debug this issue?
The text was updated successfully, but these errors were encountered:
Describe the bug
I trained a binary classification model with Sagemaker's container for xgboost 1.7-1.
I also have previously developed a xgboost model for the same dataset locally.
The positive rate for the dataset is generally < 4%, very low occurrence.
When I compared the predicted probabilities from the sagemaker builtin model and my local model, the results are opposite.
Given the low positive rate I believe the sagemaker model outputs are incorrect.
See images.
I checked the inputs for training and they are identical, except that on my local machine I fed the model csv file whereas for sagemaker xgboost it required the data in libsvm format. But after double checking the training data were the same. I also fed the same hyperparameter.
To reproduce
For sagemaker:
Is there any way to debug this issue?
The text was updated successfully, but these errors were encountered: