Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntityManager.find() with filter+limit can spawn thousands of recursive sequential queries #377

Open
smcroskey opened this issue Nov 27, 2023 · 2 comments

Comments

@smcroskey
Copy link
Contributor

smcroskey commented Nov 27, 2023

Because of how explicit limits are applied, we've run into situations where a single find() call can spawn thousands of sequential queries internally. With a small enough limit, and with a filter that would ordinarily return no results, it effectively turns into a very slow scan of a partition. This is fairly unexpected behavior when specifying an explicit limit; furthermore, there's no way for TypeDorm to provide a limit directly to the DynamoDB query call without also triggering this potential runaway recursive query.

Suggestions:

  • There should be an option to specify whether the limit should be a DyanmoDB-only limit (IOW, the number results returned could be less than the limit provided, and guarantee only one query is made) or the TypeDorm notion of a limit (where it guarantees continually querying until either the limit is reached or the results are exhausted)
  • When using the TypeDorm notion of limit, recursive queries could potentially use exponentially increasing limits (or perhaps no limit at all) so that when a small limit is provided, it doesn't turn into a flood of sequential internal queries returning 0 results
  • Additionally, it may make sense to use a reasonable cap so that a single find(...) call will not spawn more than X queries internally, or look at more than X records (like a query/scanLimit). E.g. 2000+ queries from a single find(...) is probably never intended.
@m-szaf
Copy link
Contributor

m-szaf commented Dec 2, 2023

An additional suggestion would be providing a RCU limit which describes the max read capacity units the find operation should use before stopping - regardless of reaching a provided limit. This may be better as it is more predictable in terms of cost.

@m-szaf
Copy link
Contributor

m-szaf commented Dec 9, 2023

👋 @whimzyLive Feature add PR for this issue is here 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants