-
-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Slow scan operations #1260
Comments
Curious about where this is occurring. You mention this is related to scan operations. Are you confident that is the culprit? I'm aware of this PR which caused high initialization times for Dynamoose: #1154. Could that potentially be the culprit? You could try the v3 alpha to see if that PR improves it. I'm also aware of a change made recently to remove a cache that was causing memory leaks. That was related to parsing the document. There is no doubt that I expect Dynamoose to have more overhead than AWS-SDK itself. More work is required to ensure schema conformance. However this performance hit shouldn't be excessive (as it looks to be in your example). |
Hi, I've got the same problem using Dynsamoose on AWS Lambda, on localhost ubuntu works perfectly. After some investigation I have found what cause the issue: this line dynamoose/lib/DocumentRetriever.ts Line 59 in a838224
Processing 150 records took 10s (sic!) |
@FanFataL Sadly that line is very critical and doesn't really lead to information about what is going wrong. We kinda need to dive into that method and determine which part specifically is slowing things down. As mentioned in my previous message, v3 alpha does make some improvements to performance. Are you able to try that and see if the issue persists? |
We were experiencing the same performance issues with a large query request. After upgrading to v3 alpha, performance is now on-par with the native client. |
Closing since the latest report is that v3 alpha is on-par with native client. |
Now that I see this issue I am going to give v3 a try, but since I did debug this on v2 here are my findings incase anyone else runs into this. I noticed as my data set grew my api call was taking significantly longer to return the data set. For reference I am now returning about 2,000 items at a time from DynamoDB. It was now taking abut 7 seconds on my local computer. The latency is on the main thread unfortunately which is locking up my server for processing other request. First I found the line @FanFataL pointed out which in my case was taking about 5 of the 7 seconds:
Next the slowdown was coming from:
Inside that function it was coming from the line:
Inside that function it seams to mostly be coming from, which takes abut 4 seconds
With this line taking about a good second in it's self:
|
@fishcharlie Unfortunately I am seeing the same issue with v3, but now it's the lines:
Speed wise I am using v3 being about 0.5 - 1.0 sec slower then v2 unfortunately. Is there a way to optimize querying for a list of objects? I am concerned this is not going to scale up to what I need for my use case. EDIT: So I dug a big deeper into v3 In my case the lines
Never get hit, so my understanding is this whole code block is not helping my use case. Would it be possible to add an optimization that says we don't need to run this code block at all? Also it looks like the logic inside the promise
Gets calculated for each item in the result set. Would it be possible to add a cache of the results for the model being queried so we don't have to calculate it for each item in the result set? |
@fishcharlie Your welcome. I did some more digging into performance profiles for v3. Another slow spot I have identified is the function Is it possible for a return from DynamoDB to be circular? If not a possible solution can be adding a parameter to Also it looks like for each item in the return the line I will keep digging and see what else I find. |
This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days. |
I'm seeing what I think is similar behavior for at least
Here is my schema:
And attached is the sample document I'm trying to retrieve/deserialize: sample.txt This is pretty major hindrance to our application, so please let me know of any mitigations/workarounds/alternatives/etc. Otherwise, I'd be interested to know what the timeline is for fixing this and if there's anything I can do. Please let me know if a new ticket would be helpful. Thanks! |
I know it might defeat the purpose of using a schema. but can you share the stats for getting your document in this two ways? const document = await Model.get({...})
// VS
const request = await Model.get({...}, {return: 'request'})
const document = await dynamoose.aws.ddb().getItem(request) |
Having similar performance issues when performing a |
@ptejada I tried this, and bypassing Dynamoose seems to shave off 600-700ms. |
@fishcharlie I can confirm that https://github.com/tranhl/dynamooose-deep-copy-repro As @PaulAtST has identified in this comment, Instead of relying on |
@tranhl Nice work! Sounds like a solid approach to me. |
Summary:
I'm using dynamoose for my project, and overall it works well. However scan operations are several times slower than the same operations using AWS DocumentClient.
Code sample:
Schema
It's the same with all my different schemas. Here is an example of the simplest one. None of them use Buffer type.
Model
General
Current output and behavior (including stack trace):
Example 1:
Scanning ~450 items that are ~1 kb each (according to aws dynamodb console).
Using aws DocumentClient and scanning all: ~400 ms
Using dynamoose scan all: ~4500 ms
Example 2:
Scanning ~23000 items that are ~230 bytes each.
Using aws DocumentClient and scanning all: ~6000 ms
Using dynamoose scan all: ~24000 ms
Expected output and behavior:
Somewhat similar performance
Environment:
Operating System: Amazon Linux
Operating System Version: 2
Node.js version (
node -v
): v14.xNPM version: (
npm -v
): 7.18.1Dynamoose version: 2.8.1
Other information (if applicable):
AWS lambda using the Serverless framework.
Serverless version: 2.48.0
aws sdk version: 2.952.0
AWS_NODEJS_CONNECTION_REUSE_ENABLED: 1
Other:
The text was updated successfully, but these errors were encountered: