[WM-2570] Get task cost #7415

kpierre13 · 2024-04-29T23:07:22Z

No description provided.

…m-2570-task-per-column-kp

THWiseman

Awesome stuff! Great job figuring out how to efficiently ask for some extra data from TES when we need it. Very useful feature.

THWiseman · 2024-05-10T19:28:39Z

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

+  )(implicit ec: ExecutionContext): Future[String] =
+    for {
+      logs <- getTaskLogsFn(handle)
+      taskEndTime = logs.end_time.get


It's probably a good idea to avoid the .get here since it will cause a crash if the end_time isn't present. Perhaps its better for this function to return a Future[Option[String]] since tes might successfully respond, but not include the end_time in its response?

THWiseman · 2024-05-10T19:40:30Z

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

@@ -401,9 +431,41 @@ class TesAsyncBackendJobExecutionActor(override val standardParams: StandardAsyn

  override def requestsAbortAndDiesImmediately: Boolean = false

+  override def onTaskComplete(runStatus: TesRunStatus, handle: StandardAsyncPendingExecutionHandle): Unit = {
+    val taskEndTime = getTaskEndTime(handle, getTaskLogs)
+    if (runStatus == Error() | runStatus == Failed()) {


I think this might not do exactly what we want, since it's possible for two Error or Failed case classes to not == one another.

For example: Error(Seq("an error message")) != Error(Seq.empty)
I think it might be better to match on the type, like

runStatus match { case Error(_) => // fetch and tell errors case Failed(_) => // fetch and tell errors case _ => Unit }

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

THWiseman · 2024-05-10T20:25:35Z

...ends/tes/src/test/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActorSpec.scala

 import scala.util.{Failure, Try}

 class TesAsyncBackendJobExecutionActorSpec
-    extends AnyFlatSpec
+    extends TestKitSuite
+    with AnyFlatSpecLike
    with Matchers
    with MockSugar
    with TableDrivenPropertyChecks {
  behavior of "TesAsyncBackendJobExecutionActor"


I think we would really benefit from some tests of the polling behavior, since that seems to be the bulk of the new logic in this branch. I think it's important to know what web requests are made and what status is returned when calling pollStatusAsync in different conditions. We could try to make that function pure and test it directly in unit tests, or do some work to spin up our own mock TesAsyncBackendJobExecutionActor to use for unit tests.

I think it would also be great to have a test that makes sure we're emitting the expected metadata when calling onTaskComplete in different conditions.

Happy to help!

jgainerdewar

It's awesome to see this coming to fruition!

backend/src/main/scala/cromwell/backend/standard/StandardAsyncExecutionActor.scala

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

jgainerdewar · 2024-05-10T19:58:13Z

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

+    val taskEndTime = getTaskEndTime(handle, getTaskLogs)
+    if (runStatus == Error() | runStatus == Failed()) {
+      val errors = getErrorSeq(runStatus, handle, getErrorLogs)


It looks like this will be making two identical requests to TES, can we fetch the full task view once?

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

jgainerdewar · 2024-05-10T20:24:18Z

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

+      makeRequest[MinimalTaskView](HttpRequest(uri = s"$tesEndpoint/${handle.pendingJob.jobId}?view=MINIMAL")) map {
+        response =>
+          val state = response.state
+          getTesStatus(Option(state), Option.empty, handle.pendingJob.jobId)


Rather than passing through empty cost data, should this be passing the cost data we already loaded?

For this specific line, I'm passing in Option.empty because the tesVmCostData is not in scope; however for the lines similar above this one, I'm passing in that vm cost object rather than Option.empty

Right, I'm wondering if that's going to lead to the behavior we want. I might be missing something, but it looks like right now, if we already have all the cost data, when we poll we'll end up with a status without any cost data attached - which means we'll fetch it again next time we poll.

jgainerdewar · 2024-05-10T20:24:50Z

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

-            Failed()
+      case s if s.contains("EXECUTOR_ERROR") =>
+        jobLogger.info(s"TES reported a failure for Job ${jobId}: '$s'")
+        Failed()


Do we want to track cost data for failed tasks?

I would think so, but is it necessary here for any of the states? I think it makes sense to get the cost data and put it into the metadata, but I'm not sure if it's necessary to add it here when we get the status. I'm going to remove it from all occurrences for now.

Ah, I think I get it - these are terminal states, so they don't need to carry cost data with them?

Yeah, I wouldn't think the terminal states would care about whether or not we fetched the cost data since we'll stop polling anyway.

jgainerdewar · 2024-05-10T20:30:03Z

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

  def isTerminal = true
 }

 object TesAsyncBackendJobExecutionActor {
  val JobIdKey = "tes_job_id"
+  private type StandardAsyncRunInfo = Any


I think I overheard some discussion of how we ended up with Any here but I didn't get the details - what prevents us from using a more narrowly-defined type?

The Any conversation was surrounding a previous function that is now split into getTaskEndTime and getErrorSeq which no long have Any as its return type.

jgainerdewar · 2024-05-10T20:31:16Z

...ends/tes/src/test/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActorSpec.scala

    with Matchers
    with MockSugar
    with TableDrivenPropertyChecks {
  behavior of "TesAsyncBackendJobExecutionActor"

+  type StandardAsyncRunInfo = Any


Can we reference the types in TesAsyncBackendJobExecutionActor here rather than defining these again?

jgainerdewar · 2024-05-10T20:33:11Z

...ends/tes/src/test/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActorSpec.scala

@@ -70,6 +82,15 @@ class TesAsyncBackendJobExecutionActorSpec
    content = None


These tests are great - nice mocking! I would love to see more, though, particularly around the polling (and then not polling) behavior, and how we respond to various forms of the FULL_VIEW from TES. You can set up Actor-level testing that inspects the metadata stored by actions, which would be really useful here.

kpierre13 · 2024-05-13T20:21:32Z

@jgainerdewar Accidentally deleted your comment in the new IJ UI. Reposting here:

What about tasks that error out? Do we need to store cost data there the same way we do complete and cancelled tasks?

kpierre13 · 2024-05-13T20:23:44Z

@jgainerdewar Accidentally deleted your comment in the new IJ UI. Reposting here:

What about tasks that error out? Do we need to store cost data there the same way we do complete and cancelled tasks?

I think we should include cost data for errored/failed tasks so the information is available and can be stored. From my understanding, task that doesn't immediately fail/error can still incur cost that should still be calculated.

…d out code)

aednichols

Here is a change based on new information that makes our lives easier.

The Funnel source code search was much less esoteric than one might guess, I searched for "tes" "size_bytes" and it was the third Google result.

supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes/TesTask.scala

supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes/TesResponseJsonFormatter.scala

…/TesTask.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>

…/TesResponseJsonFormatter.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala

…m-2570-task-per-column-kp

kpierre13 and others added 11 commits April 25, 2024 16:09

file transfer

f16e553

get task cost; tests

f77bec4

Merge branch 'develop' into wm-2570-task-per-column-kp

a351bf3

scalafmt & revert terra_tes_application.conf.ctmpl

5d62554

Merge remote-tracking branch 'origin/wm-2570-task-per-column-kp' into w…

a73e22b

…m-2570-task-per-column-kp

cleanup

5a6c71f

scalafmt

9e78f92

refactor poll to get status and costData more consistently

8841ebb

Refactor into object

90a82c4

wip

d0cd502

new tests; refactor

ff7d047

kpierre13 marked this pull request as ready for review May 10, 2024 14:10

kpierre13 requested a review from a team as a code owner May 10, 2024 14:10

kpierre13 added 2 commits May 10, 2024 10:10

scalafmt

91112a7

cleanup

1f094d9

THWiseman reviewed May 10, 2024

View reviewed changes

jgainerdewar reviewed May 10, 2024

View reviewed changes

broadinstitute deleted a comment from jgainerdewar May 13, 2024

kpierre13 added 5 commits May 13, 2024 18:58

return a Future[Option[String]] for getTaskEndTime

9b57d6e

onTaskComplete errors matching on runStatus

e22f2a6

add pollStatusAsync comment

dba98f2

make one call to TES onTaskComplete

32924fc

wip

3ad8123

jgainerdewar mentioned this pull request May 15, 2024

ID-1276 Introduce Bard Service for sending metrics #7434

Open

kpierre13 added 4 commits May 16, 2024 13:46

make pollStatusAsync more concise

db55c31

update queryStatusAndCostData name

ac9af68

implement fetchFullTesTask and fetchMinimalTesTasl (contains commente…

71d8f7c

…d out code)

headOption not head; build cost data object in for & re-reference

f0f06a1

kpierre13 added 14 commits June 5, 2024 09:11

update test? scalafmt

13d596c

add comment for deserializer; cleanup test; OutputFileLog value (debug)

38379ff

move deserializer (debug)

4678ae6

debugging deserializer

b5c3759

JsNumber -> JsString (debug)

c32a641

(debug)

85aa3b0

(debug)

bfc52bf

refactor write (unit tests passing) (debug)

32fb30e

(debug)

8779e9c

(debug) with friends

2ccc52c

please be the final (debug)

3f02149

adding toString for all TesRunStatus to omit costData from metadata

6b41ca0

Cleanup & custom formatter tests

696da4a

scalafmt

f93f8c9

aednichols reviewed Jun 5, 2024

View reviewed changes

kpierre13 and others added 5 commits June 6, 2024 09:50

Update supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes…

649b0fd

…/TesTask.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>

Update supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes…

fbcbc4a

…/TesResponseJsonFormatter.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>

Update supportedBackends/tes/src/main/scala/cromwell/backend/impl/tes…

4b6d35b

…/TesResponseJsonFormatter.scala Co-authored-by: Adam Nichols <anichols@broadinstitute.org>

scalafmt

23909c7

update unit tests

f9cb2e0

THWiseman reviewed Jun 6, 2024

View reviewed changes

...Backends/tes/src/main/scala/cromwell/backend/impl/tes/TesAsyncBackendJobExecutionActor.scala Outdated Show resolved Hide resolved

kpierre13 and others added 4 commits June 6, 2024 16:30

PR comment

b61f60b

PR comment

89554bf

sbtfmt

34a8355

Merge branch 'develop' into wm-2570-task-per-column-kp

35a6fcc

THWiseman approved these changes Jun 7, 2024

View reviewed changes

kpierre13 added 2 commits June 7, 2024 10:41

update unit test

580e52b

Merge remote-tracking branch 'origin/wm-2570-task-per-column-kp' into w…

7b38943

…m-2570-task-per-column-kp

kpierre13 merged commit f2ade5b into develop Jun 7, 2024
37 checks passed

kpierre13 deleted the wm-2570-task-per-column-kp branch June 7, 2024 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WM-2570] Get task cost #7415

[WM-2570] Get task cost #7415

kpierre13 commented Apr 29, 2024

THWiseman left a comment •

edited

THWiseman May 10, 2024

THWiseman May 10, 2024

THWiseman May 10, 2024

jgainerdewar left a comment

jgainerdewar May 10, 2024

jgainerdewar May 10, 2024

kpierre13 May 16, 2024

jgainerdewar May 17, 2024

jgainerdewar May 10, 2024

kpierre13 May 16, 2024

jgainerdewar May 17, 2024

kpierre13 May 17, 2024

jgainerdewar May 10, 2024

kpierre13 May 16, 2024

jgainerdewar May 10, 2024

jgainerdewar May 10, 2024

kpierre13 commented May 13, 2024

kpierre13 commented May 13, 2024

aednichols left a comment

		@@ -70,6 +82,15 @@ class TesAsyncBackendJobExecutionActorSpec
		content = None

[WM-2570] Get task cost #7415

[WM-2570] Get task cost #7415

Conversation

kpierre13 commented Apr 29, 2024

THWiseman left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgainerdewar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kpierre13 commented May 13, 2024

kpierre13 commented May 13, 2024

aednichols left a comment

Choose a reason for hiding this comment

THWiseman left a comment •

edited