Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save cwl.input.json to keep collection if large #83

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jrandall
Copy link
Contributor

If cwl.input.json is larger than 1MB, save it to keep rather than inlining it in the container_request mounts entry.

@tetron
Copy link
Member

tetron commented Nov 14, 2018

This is a good idea, workbench won't know how to show workflow inputs (but they are very large so that's probably okay). Changing the literal form from "json" to "text" is also likely to cause regressions.

@mr-c
Copy link
Contributor

mr-c commented Sep 24, 2019

What's the status of this PR?

@tetron
Copy link
Member

tetron commented Sep 24, 2019

I think the status is that it potentially breaks the ability to display workflow inputs in Workbench, so in order to accept the fix it would need to be paired with an update to Workbench (or at least some investigation of the side effects of this change on Workbench).

@jrandall
Copy link
Contributor Author

@tetron I've lost track of exactly where this ended up, but it looks like I did implement some fixes on the workbench side to support "kind":"text" (but it looks like I did not PR them and they may have bitrotted by now): wtsi-hgi@bf61e04

Also see: https://dev.arvados.org/issues/13685

My recollection is that there are two related performance fixes related to handling of CWL workflows with large numbers of inputs. This PR touches both of them.

One is to implement "kind": "text" as an alternative to "kind": "json" - this is so that various parts of the code will stop repeatedly parsing the (large and complex) inputs as JSON and just treat them as opaque text instead. Even JSON as small as 1MB can, for example, cause the workbench to be very slow as it processes the JSON and then renders DOM elements to display all elements contained therein, but this did not only affect the workbench, other parts of the system were highly accelerated when large numbers of container requests with very large inputs were being handled - I believe that includes the API server.

The other fix is to store very large content (defined here as >1MB although that should probably have a config knob) in keep rather than inlining it into the mounts structure (which then gets passed around and repeatedly parsed as part of the container / container_request objects). This further speeds up processing of containers / container requests because they are much smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants