Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polyaxon Python API - RunClient watch_logs() alternate or parameter to stop its execution and return string #1524

Open
QaisarRajput opened this issue Dec 30, 2022 · 1 comment
Labels

Comments

@QaisarRajput
Copy link

Hi,
Context:
I have been running some experiments on EKS. Its working great, but my logs disappear after the run execution. Also while the execution is happening, after arbitrary time pod disconnects and previous logs are lost. EKS/polyaxon/mpi recovers the jobs execution and Launcher pod starts the training from where disconnect happened.

Issue:
The issue is that i want to retain the logs of my runs. I am not able to use persistent volumes yet which can be a solution. What i am trying to use is the polyaxon python api. More specifically i am using RunClient and looking at get_logs() and watch_logs().
get_logs() is not returning anything and i think its not intended for this. watch_logs() is returning the logs but issue is, its not technically "returning" anything. It seems to be like a stream function, which stdouts on console (jupyter, shell). In my code i am not able to get the logs with this, as it keeps on printing without stop.

Question/Enhancement
Is there another way to get the logs through python api? or can we have an alternate function to watch_logs which just returns the logs and its execution is done. I intend to keep saving snapshot of logs so that even if disconnection happens i can then join the log files later. Open to any suggestions. FYI, i have tried cli too. polyaxon ops logs -f its giving me encoding issues.

@polyaxon-team
Copy link
Contributor

Logs are supposed to be persisted by default as soon as the job/service is in a final status (succeeded, failed, stopped).
The issue must be on your deployment configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants