save cwl.input.json to keep collection if large #83

jrandall · 2018-11-14T15:36:29Z

If cwl.input.json is larger than 1MB, save it to keep rather than inlining it in the container_request mounts entry.

tetron · 2018-11-14T21:40:03Z

This is a good idea, workbench won't know how to show workflow inputs (but they are very large so that's probably okay). Changing the literal form from "json" to "text" is also likely to cause regressions.

mr-c · 2019-09-24T08:16:41Z

What's the status of this PR?

tetron · 2019-09-24T13:29:49Z

I think the status is that it potentially breaks the ability to display workflow inputs in Workbench, so in order to accept the fix it would need to be paired with an update to Workbench (or at least some investigation of the side effects of this change on Workbench).

jrandall · 2019-09-24T14:01:25Z

@tetron I've lost track of exactly where this ended up, but it looks like I did implement some fixes on the workbench side to support "kind":"text" (but it looks like I did not PR them and they may have bitrotted by now): wtsi-hgi@bf61e04

Also see: https://dev.arvados.org/issues/13685

My recollection is that there are two related performance fixes related to handling of CWL workflows with large numbers of inputs. This PR touches both of them.

One is to implement "kind": "text" as an alternative to "kind": "json" - this is so that various parts of the code will stop repeatedly parsing the (large and complex) inputs as JSON and just treat them as opaque text instead. Even JSON as small as 1MB can, for example, cause the workbench to be very slow as it processes the JSON and then renders DOM elements to display all elements contained therein, but this did not only affect the workbench, other parts of the system were highly accelerated when large numbers of container requests with very large inputs were being handled - I believe that includes the API server.

The other fix is to store very large content (defined here as >1MB although that should probably have a config knob) in keep rather than inlining it into the mounts structure (which then gets passed around and repeatedly parsed as part of the container / container_request objects). This further speeds up processing of containers / container requests because they are much smaller.

jrandall added 2 commits November 14, 2018 13:36

save cwl.input.json to keep collection if large

c2e293f

fix syntax of dict access for ["content"]

75484c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save cwl.input.json to keep collection if large #83

save cwl.input.json to keep collection if large #83

jrandall commented Nov 14, 2018

tetron commented Nov 14, 2018

mr-c commented Sep 24, 2019

tetron commented Sep 24, 2019

jrandall commented Sep 24, 2019

save cwl.input.json to keep collection if large #83

Are you sure you want to change the base?

save cwl.input.json to keep collection if large #83

Conversation

jrandall commented Nov 14, 2018

tetron commented Nov 14, 2018

mr-c commented Sep 24, 2019

tetron commented Sep 24, 2019

jrandall commented Sep 24, 2019