-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"model_monitor_compute_histogram_buckets" Crashes When Getting Columns in Common #1983
Comments
Hi, @sraza-onshape , you have this issue is because we are expecting both data set in production and reference data. Let us loop in our PM to decide if we will support the the case that production data is null |
@sraza-onshape, Just to clarify, to compute the data drift, we have to have two datasets, one is baseline data set, the other is target data set. We need to compare the distribution of the training data (referred as baseline or reference) and target data (or production data) for data drift. So you have to provide production data in your case. |
Hi @VivienneTang and Team, thanks for the reply. I redefined the monitoring pipeline and it will run again soon. In the meantime, this is the code we used to initialize from azure.ai.ml import MLClient
from azure.ai.ml import Input
from azure.ai.ml.constants import (
MonitorDatasetContext,
)
from azure.ai.ml.entities import (
ProductionData,
)
ml_client = MLClient(...)
production_data_metadata = ml_client.datastores.get(name="workspaceblobstore")
production_data_metadata_dict = production_data_metadata._to_dict()
storage_uri = f"{production_data_metadata_dict['protocol']}://{production_data_metadata_dict['account_name']}.blob.{production_data_metadata_dict['endpoint']}/{production_data_metadata_dict['container_name']}"
production_data = ProductionData(
input_data=Input(
type="uri_folder",
path=storage_uri,
),
data_context=MonitorDatasetContext.MODEL_INPUTS,
) |
Steps to reproduce
ReferenceData
. Don't provide any argument for theProductionData
.Expected behavior
When the pipeline runs, it should succeed.
Actual behavior
In the pipeline, we have an error where the
DataDriftSignal
is computed:Within the "sub-pipeline", the error itself occurs in the node that does
compute_histogram_buckets
:And this is the info provided by the
stderrorlogs.txt
:Addition information
Please let us know in case this bug is due to some error on our end, in terms of not understanding how to use Azure Machine Learning. For context, here are the tutorials we've used so far (for Steps 1-4) to try and learn the tool:
The text was updated successfully, but these errors were encountered: