-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WM-2570] Get task cost #7415
[WM-2570] Get task cost #7415
Conversation
…m-2570-task-per-column-kp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff! Great job figuring out how to efficiently ask for some extra data from TES when we need it. Very useful feature.
)(implicit ec: ExecutionContext): Future[String] = | ||
for { | ||
logs <- getTaskLogsFn(handle) | ||
taskEndTime = logs.end_time.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to avoid the .get
here since it will cause a crash if the end_time
isn't present. Perhaps its better for this function to return a Future[Option[String]]
since tes might successfully respond, but not include the end_time in its response?
@@ -401,9 +431,41 @@ class TesAsyncBackendJobExecutionActor(override val standardParams: StandardAsyn | |||
|
|||
override def requestsAbortAndDiesImmediately: Boolean = false | |||
|
|||
override def onTaskComplete(runStatus: TesRunStatus, handle: StandardAsyncPendingExecutionHandle): Unit = { | |||
val taskEndTime = getTaskEndTime(handle, getTaskLogs) | |||
if (runStatus == Error() | runStatus == Failed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not do exactly what we want, since it's possible for two Error
or Failed
case classes to not ==
one another.
For example: Error(Seq("an error message")) != Error(Seq.empty)
I think it might be better to match on the type, like
runStatus match {
case Error(_) => // fetch and tell errors
case Failed(_) => // fetch and tell errors
case _ => Unit
}
...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala
Outdated
Show resolved
Hide resolved
import scala.util.{Failure, Try} | ||
|
||
class TesAsyncBackendJobExecutionActorSpec | ||
extends AnyFlatSpec | ||
extends TestKitSuite | ||
with AnyFlatSpecLike | ||
with Matchers | ||
with MockSugar | ||
with TableDrivenPropertyChecks { | ||
behavior of "TesAsyncBackendJobExecutionActor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would really benefit from some tests of the polling behavior, since that seems to be the bulk of the new logic in this branch. I think it's important to know what web requests are made and what status is returned when calling pollStatusAsync
in different conditions. We could try to make that function pure and test it directly in unit tests, or do some work to spin up our own mock TesAsyncBackendJobExecutionActor
to use for unit tests.
I think it would also be great to have a test that makes sure we're emitting the expected metadata when calling onTaskComplete
in different conditions.
Happy to help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome to see this coming to fruition!
backend/src/main/scala/cromwell/backend/standard/StandardAsyncExecutionActor.scala
Outdated
Show resolved
Hide resolved
...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala
Outdated
Show resolved
Hide resolved
...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala
Outdated
Show resolved
Hide resolved
val taskEndTime = getTaskEndTime(handle, getTaskLogs) | ||
if (runStatus == Error() | runStatus == Failed()) { | ||
val errors = getErrorSeq(runStatus, handle, getErrorLogs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this will be making two identical requests to TES, can we fetch the full task view once?
...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala
Outdated
Show resolved
Hide resolved
makeRequest[MinimalTaskView](HttpRequest(uri = s"$tesEndpoint/${handle.pendingJob.jobId}?view=MINIMAL")) map { | ||
response => | ||
val state = response.state | ||
getTesStatus(Option(state), Option.empty, handle.pendingJob.jobId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than passing through empty cost data, should this be passing the cost data we already loaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific line, I'm passing in Option.empty
because the tesVmCostData
is not in scope; however for the lines similar above this one, I'm passing in that vm cost object rather than Option.empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm wondering if that's going to lead to the behavior we want. I might be missing something, but it looks like right now, if we already have all the cost data, when we poll we'll end up with a status without any cost data attached - which means we'll fetch it again next time we poll.
Failed() | ||
case s if s.contains("EXECUTOR_ERROR") => | ||
jobLogger.info(s"TES reported a failure for Job ${jobId}: '$s'") | ||
Failed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to track cost data for failed tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think so, but is it necessary here for any of the states? I think it makes sense to get the cost data and put it into the metadata, but I'm not sure if it's necessary to add it here when we get the status. I'm going to remove it from all occurrences for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I get it - these are terminal states, so they don't need to carry cost data with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wouldn't think the terminal states would care about whether or not we fetched the cost data since we'll stop polling anyway.
def isTerminal = true | ||
} | ||
|
||
object TesAsyncBackendJobExecutionActor { | ||
val JobIdKey = "tes_job_id" | ||
private type StandardAsyncRunInfo = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I overheard some discussion of how we ended up with Any
here but I didn't get the details - what prevents us from using a more narrowly-defined type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Any
conversation was surrounding a previous function that is now split into getTaskEndTime
and getErrorSeq
which no long have Any
as its return type.
with Matchers | ||
with MockSugar | ||
with TableDrivenPropertyChecks { | ||
behavior of "TesAsyncBackendJobExecutionActor" | ||
|
||
type StandardAsyncRunInfo = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reference the types in TesAsyncBackendJobExecutionActor
here rather than defining these again?
@@ -70,6 +82,15 @@ class TesAsyncBackendJobExecutionActorSpec | |||
content = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are great - nice mocking! I would love to see more, though, particularly around the polling (and then not polling) behavior, and how we respond to various forms of the FULL_VIEW from TES. You can set up Actor-level testing that inspects the metadata stored by actions, which would be really useful here.
@jgainerdewar Accidentally deleted your comment in the new IJ UI. Reposting here:
|
I think we should include cost data for errored/failed tasks so the information is available and can be stored. From my understanding, task that doesn't immediately fail/error can still incur cost that should still be calculated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a change based on new information that makes our lives easier.
The Funnel source code search was much less esoteric than one might guess, I searched for "tes" "size_bytes" and it was the third Google result.
supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes/TesTask.scala
Outdated
Show resolved
Hide resolved
supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes/TesResponseJsonFormatter.scala
Outdated
Show resolved
Hide resolved
supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes/TesResponseJsonFormatter.scala
Outdated
Show resolved
Hide resolved
…/TesTask.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>
…/TesResponseJsonFormatter.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>
…/TesResponseJsonFormatter.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>
No description provided.