Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cascade experiment id fk deletion in datasets table #11966

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chilir
Copy link
Contributor

@chilir chilir commented May 10, 2024

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11966/merge

Checkout with GitHub CLI

gh pr checkout 11966

Related Issues/PRs

Resolve #10699

What changes are proposed in this pull request?

Apply cascading deletion to the experiment_id foreign key in the datasets table.
Resubmission of #11695

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Fixes a bug where mlflow gc will fail attempting to permanently delete an experiment if existing datasets are associated with the experiment.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Signed-off-by: Michael Li <minghao_li@outlook.com>
@github-actions github-actions bot added area/server-infra MLflow Tracking server backend area/sqlalchemy Use of SQL alchemy in tracking service or model registry rn/bug-fix Mention under Bug Fixes in Changelogs. patch-2.12.3 labels May 10, 2024
Copy link

github-actions bot commented May 10, 2024

Documentation preview for 53561b5 will be available when this CircleCI job
completes successfully.

More info

CONSTRAINT trace_info_pk PRIMARY KEY (request_id),
PRIMARY KEY (request_id),
Copy link
Contributor Author

@chilir chilir May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy can you double check if this is okay? Running tests/db/update_schemas.sh is automatically removing the primary key constraint name in the mysql schema for the trace tables despite this PR not touching the trace models at all. It looks like mysql doesn't allow for primary key constraints to be named?

EDIT: on second look, all named primary key constraints are already unnamed in the mysql schema, so this is expected

@chilir
Copy link
Contributor Author

chilir commented May 10, 2024

@BenWilson2 resubmission of #11695 here, if you could take a look that would be great

chilir added 2 commits May 9, 2024 21:45
Signed-off-by: Michael Li <minghao_li@outlook.com>
Signed-off-by: Michael Li <minghao_li@outlook.com>
@chilir
Copy link
Contributor Author

chilir commented May 10, 2024

tests/models/test_cli.py::test_build_docker[False] Docker image build failed from:

#7 263.8 E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/t/tcl8.6/tcl8.6-dev_8.6.10+dfsg-1_amd64.deb  Undetermined Error [IP: 185.125.190.36 80]

https://github.com/mlflow/mlflow/actions/runs/9027441291/job/24806333059?pr=11966#step:10:1325

Not too sure how to address, error unrelated to the PR

@chilir
Copy link
Contributor Author

chilir commented May 20, 2024

Hi @BenWilson2, any chance you could take a look at this PR this week? Hoping to avoid the same situation as last time where another DB migration gets merged prior to this. Thanks in advance!

@chilir
Copy link
Contributor Author

chilir commented May 22, 2024

Looks like another migration was merged #12102

Please hold off on merging this PR, I'll try to get around to addressing this over the weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/server-infra MLflow Tracking server backend area/sqlalchemy Use of SQL alchemy in tracking service or model registry patch-2.12.3 rn/bug-fix Mention under Bug Fixes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] mlflow gc do not remove deleted experiments when runs were tracked with datasets
1 participant