Polyaxon Python API - RunClient watch_logs()
alternate or parameter to stop its execution and return string
#1524
Labels
watch_logs()
alternate or parameter to stop its execution and return string
#1524
Hi,
Context:
I have been running some experiments on EKS. Its working great, but my logs disappear after the run execution. Also while the execution is happening, after arbitrary time pod disconnects and previous logs are lost. EKS/polyaxon/mpi recovers the jobs execution and Launcher pod starts the training from where disconnect happened.
Issue:
The issue is that i want to retain the logs of my runs. I am not able to use persistent volumes yet which can be a solution. What i am trying to use is the polyaxon python api. More specifically i am using RunClient and looking at
get_logs()
andwatch_logs()
.get_logs()
is not returning anything and i think its not intended for this.watch_logs()
is returning the logs but issue is, its not technically "returning" anything. It seems to be like a stream function, which stdouts on console (jupyter, shell). In my code i am not able to get the logs with this, as it keeps on printing without stop.Question/Enhancement
Is there another way to get the logs through python api? or can we have an alternate function to
watch_logs
which just returns the logs and its execution is done. I intend to keep saving snapshot of logs so that even if disconnection happens i can then join the log files later. Open to any suggestions. FYI, i have tried cli too.polyaxon ops logs -f
its giving me encoding issues.The text was updated successfully, but these errors were encountered: