Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lambda support for DynamoDb Kinesis Stream processing #1554

Open
2 tasks
danny-zegel-zocdoc opened this issue Jul 30, 2023 · 6 comments
Open
2 tasks

Lambda support for DynamoDb Kinesis Stream processing #1554

danny-zegel-zocdoc opened this issue Jul 30, 2023 · 6 comments
Labels
feature-request A feature should be added or improved. module/lambda-client-lib p2 This is a standard priority issue queued

Comments

@danny-zegel-zocdoc
Copy link

danny-zegel-zocdoc commented Jul 30, 2023

Describe the feature

Given a DynamoDb table configured with a Kinesis Stream and a Lambda to process the Kinesis Stream, the Amazon.Lambda.KinesisEvents package provides KinesisEvent as the lambda input object with the following structure

public class KinesisEvent
{
    public IList<KinesisEvent.KinesisEventRecord> Records { get; set; }

    public class Record : Amazon.Kinesis.Model.Record
    {
        public string KinesisSchemaVersion { get; set; }
    }

    public class KinesisEventRecord
    {
        /// A bunch of other properties
        ...

        public KinesisEvent.Record Kinesis { get; set; }
    }
}

public class Record
{
    /// A bunch of other properties
    ...

    public MemoryStream Data { get; set; }
}

In the case of a DynamoDb Kinesis Stream the Record.Data MemoryStream contains a non-encoded Json string which appears to match the type Amazon.DynamoDBv2.Model.Record from the AWSSDK.DynamoDBv2 package. Unfortunately the Json cannot be easily deserialized into a Amazon.DynamoDBv2.Model.Record due to the nested property ApproximateCreationDateTime which is in the form of epoch time in the JSON (i.e. an integer) but is typed as DateTime in Amazon.DynamoDBv2.Model.StreamRecord. I'm also concerned that there might be breaking edge-cases when directly deserializing AttributeValue types which are also nested inside Amazon.DynamoDBv2.Model.StreamRecord.

I seem to be able to deserialize Record.Data successfully using the Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer serializer which is really not intuitive and disappointing to have to do.

I'm wondering if it would be reasonable to either create a KinesisDynamoDbEvent type either in Amazon.Lambda.KinesisEvents or another package which types the nested Data property as a DynamoDb StreamRecord instead of a MemoryStream, or perhaps provide an intuitive utility to deserialize the MemoryStream appropriately.

Use Case

When using Lambda to process a Kinesis Stream that is backed by a DynamoDb table it is difficult to deserialize the Amazon.Kinesis.Model.Record MemoryStream property to a dynamodb record.

Proposed Solution

No response

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS .NET SDK and/or Package version used

Amazon.Lambda.KinesisEvents

Targeted .NET Platform

.NET 6 and up

Operating System and version

N/A

@danny-zegel-zocdoc danny-zegel-zocdoc added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jul 30, 2023
@danny-zegel-zocdoc
Copy link
Author

Correction: It seems that using DefaultLambdaJsonSerializer to deserialize the Amazon.DynamoDBv2.Model.Record does not work because ApproximateCreationDateTime is being generated with a 13 digit number (millisecond granularity) instead of a 10 digit number (second granularity). I thought this worked when I tested it using localstack which generates 10 digit numbers.

According to the docs here https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Services/DynamoDBv2/Generated/Model/StreamRecord.cs#L45-L52 the value should have second granularity, is this a bug in kinesis?

As a work around for now I copied the DateTimeConverter included in DefaultLambdaJsonSerializer with a slight modification to support millisecond epoch time as well as second epoch time and swapped out my version in place of the default one.

@ashishdhingra
Copy link
Contributor

@danny-zegel-zocdoc Thanks for submitting feature request. Please review if this issue is similar to existing issue #839 in our backlog. If yes, kindly close this issue to avoid duplicate issue.

Thanks,
Ashish

@ashishdhingra ashishdhingra added module/lambda-client-lib response-requested Waiting on additional info and feedback. Will move to close soon in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Jul 31, 2023
@danny-zegel-zocdoc
Copy link
Author

#839

@ashishdhingra the bug aspect of this issue does seem to be the same issue as #839 however the bug was not the primary aspect of this request. This request is primarily suggesting that there be a more prescribed/standardized way of parsing Kinesis records that come from DynamoDb being that what you need to do right now is not at all intuitive.

@ashishdhingra
Copy link
Contributor

#839
@ashishdhingra the bug aspect of this issue does seem to be the same issue as #839 however the bug was not the primary aspect of this request. This request is primarily suggesting that there be a more prescribed/standardized way of parsing Kinesis records that come from DynamoDb being that what you need to do right now is not at all intuitive.

Agreed. But changing the type in any of the nested properties would be a breaking change for existing customer as you mentioned. Creation of separate package needs to be reviewed by the team.

@danny-zegel-zocdoc
Copy link
Author

#839
@ashishdhingra the bug aspect of this issue does seem to be the same issue as #839 however the bug was not the primary aspect of this request. This request is primarily suggesting that there be a more prescribed/standardized way of parsing Kinesis records that come from DynamoDb being that what you need to do right now is not at all intuitive.

Agreed. But changing the type in any of the nested properties would be a breaking change for existing customer as you mentioned. Creation of separate package needs to be reviewed by the team.

Alternatively a new type could be created in the existing package 🤷.

I'll leave it up to your team to decide if/how this would be done.

Thank you for the consideration.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to close soon in 7 days. label Aug 1, 2023
@normj
Copy link
Member

normj commented Aug 18, 2023

I think adding a KinesisDynamoDbEvent type in the Kinesis event stream package makes sense to make this use case easier. We should not have it reference the types in the SDK. We have learned the hard way that this causes issues because the SDK types are setup for how the SDK does serialization/deserialization which is all code generated and it doesn't fit well with generic JSON serialization/deserialization. As @danny-zegel-zocdoc pointed out dates are problematic as well as how the SDK uses its ConstantClass to mimic enumerations. A while back we created a V2 of the S3 event package to separate the package from the SDK.

@ashishdhingra We should add this feature request to our backlog but to level set @danny-zegel-zocdoc I'm not sure when we will be able to get to this and in the Lambda space getting ready for .NET 8 is our priority. If you are interested in doing a PR I'm happy to take a look at it.

@ashishdhingra ashishdhingra added queued p2 This is a standard priority issue and removed needs-review labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. module/lambda-client-lib p2 This is a standard priority issue queued
Projects
None yet
Development

No branches or pull requests

3 participants