-
Notifications
You must be signed in to change notification settings - Fork 651
Issues: kubeflow/training-operator
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
TfJob creation failed due to webhook validation failure
kind/bug
lifecycle/needs-triage
#2143
opened Jun 11, 2024 by
nagar-ajay
spatial dataset training functions
kind/feature
lifecycle/needs-triage
#2141
opened Jun 7, 2024 by
Jo316
The actual default RestartPolicy of PyTorch is inconsistent with its description in the CRD
#2127
opened May 27, 2024 by
Eslody
mpijob will stuck if LastReconcileTime is updated in 1 second
#2118
opened May 17, 2024 by
shadowdsp
Export Fine-Tuned LLM after Trainer is Complete
kind/discussion
#2101
opened May 6, 2024 by
andreyvelich
fix(compatability): match-case syntax only compatible with Python3.10
release/1.8
#2096
opened May 2, 2024 by
PantherHawk
chore(style): provide type for
STORAGE_INITIALIZER_VOLUME
constant
#2093
opened May 2, 2024 by
PantherHawk
Add DeepSpeed Example with MPI Operator
area/example
good first issue
help wanted
#2091
opened Apr 29, 2024 by
andreyvelich
Flaky Test: [It] should create desired Pods and Services: Distributed TFJob (4 workers, 2 PS) is succeeded
#2086
opened Apr 27, 2024 by
tenzen-y
Not getting Kubeflow Training SDK v1.7 when installing
kubeflow-training
#2082
opened Apr 24, 2024 by
JamesKunstle
Update pytorch launcher component in Kubeflow Pipelines repository
good first issue
help wanted
kind/feature
#2068
opened Apr 17, 2024 by
anishasthana
Support CertManager for the Webhook cert generation
kind/feature
#2049
opened Apr 10, 2024 by
tenzen-y
PytorchJob restartPolicy: ExitCode does not honor backoffLimit for retryable errors
kind/feature
#2045
opened Apr 5, 2024 by
kellyaa
Add more AI/ML Training Examples
area/example
good first issue
help wanted
#2040
opened Mar 29, 2024 by
andreyvelich
3 of 7 tasks
[SDK] Use HuggingFace Data Collator for more Transformers in LLM Trainer
area/sdk
#2032
opened Mar 15, 2024 by
andreyvelich
Previous Next
ProTip!
Follow long discussions with comments:>50.