`500` error encountered on `flows/<flow_id>/runs` requests due to size of payload #322

RobBlumberg · 2022-05-31T23:53:02Z

I have a flow running in production on a 2min schedule for which artifacts / metadata are being stored. I am trying to interact with those artifacts with the client API. However, requests to /flows/<flow_id>/runs (via instantiating Flow("FlowName")) are now failing with

Metadata request: (/flows/<flow_id>/runs) failed (code 500): {"message": "Internal server error"}

Request to other endpoints, like flows/<flow_id>/runs/<run_id>, are going through just fine. After taking a closer look on API Gateway, I generated the same request through the console there, and get

Execution failed due to configuration error: Integration response of reported length 28729085 is larger than allowed maximum of 10485760 bytes.
Tue May 31 22:49:46 UTC 2022 : Method completed with status: 500

Basically the response payload is larger than the non-configurable 10MB limit on API Gateway.

I can get around this by requesting individual runs directly, but it would be great to still be able to use the flow apis to interact with the child runs of the flow via the client. (Perhaps adding the ability to pass filtering params in the requests, like fetching the last n runs?). Also curious if this is something that others have run into when deploying flows to production where many runs are produced and stored, and if there are any workarounds or things I am missing.

The text was updated successfully, but these errors were encountered:

martinbattentive · 2023-10-17T20:29:34Z

For others that have run across this issue, one workaround is to use the ui-backend interface (assuming you're using the default outerbounds terraform module), which has much richer server-side filtering features. The original cause of the issue above is that the Metaflow client does a bulk request of all runs from the metadata-service API which lacks filtering features, and then does the filtering client-side.

E.g. to get the latest run for a given flow with given tags:

response = requests.get(f"https://<api_gateway_hostname>/api/runs?_order=-ts_epoch&_limit=30&_group_limit=31&_tags=<tags>&flow_id=<flow_name>&_page=1").json()
if not response['data']:
    print(f"No data")

run_id = response['data'][0]['run_id']

run = Run(f"{flow_name}/{run_id}")

Are there any thoughts on making the metadata API adopt the server-side filtering behavior of the UI backend API?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`500` error encountered on `flows/<flow_id>/runs` requests due to size of payload #322

`500` error encountered on `flows/<flow_id>/runs` requests due to size of payload #322

RobBlumberg commented May 31, 2022 •

edited

martinbattentive commented Oct 17, 2023 •

edited

500 error encountered on flows/<flow_id>/runs requests due to size of payload #322

500 error encountered on flows/<flow_id>/runs requests due to size of payload #322

Comments

RobBlumberg commented May 31, 2022 • edited

martinbattentive commented Oct 17, 2023 • edited

`500` error encountered on `flows/<flow_id>/runs` requests due to size of payload #322

`500` error encountered on `flows/<flow_id>/runs` requests due to size of payload #322

RobBlumberg commented May 31, 2022 •

edited

martinbattentive commented Oct 17, 2023 •

edited