Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented duplicate key behaviour for events and rules #22

Open
baldawar opened this issue Aug 9, 2022 · 10 comments
Open

Undocumented duplicate key behaviour for events and rules #22

baldawar opened this issue Aug 9, 2022 · 10 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@baldawar
Copy link
Collaborator

baldawar commented Aug 9, 2022

Describe the bug

High level description of the issue is we can create duplicate keys in our rules and the library will only use the last duplicate in the rule

For example if this is my rule:

{
  "source": ["aws.s3"],
  "source": ["aws.sns"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["s3.amazonaws.com"],
    "eventSource": ["sns.amazonaws.com"]
  }
}

A customer may expect it to capture both detail S3 and SNS events from CloudTrail. But it turns out we only trigger only the second key (which in this case is SNS).

This is within the bounds of the JSON spec https://datatracker.ietf.org/doc/html/rfc8259#section-4

When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only.

Still, it's better if we 1/ handle it for users when de-serializing via ObjectMapper[1] and 2/ document this edge case if they are passing the JSONNode to us.

[1] FasterXML/jackson-databind#237 shows that we can use DeserializationFeature.FAIL_ON_READING_DUP_TREE_KEY

@baldawar baldawar added documentation Improvements or additions to documentation good first issue Good for newcomers labels Aug 9, 2022
@timbray
Copy link
Collaborator

timbray commented Aug 9, 2022

Here's what I believe:

  1. When possible, we should check for duplicate keys in Rules as they are added, and reject such rules.
  2. We should document that for Events, when duplicate keys occur we will only consider the final value in document order, and ensure that the implementation follows that rule.

#2 because that's what the software does now and if you change it you'll probably break at least one relying service, which means it has become part of your contract per Hyrum's law: https://www.hyrumslaw.com

@baldawar
Copy link
Collaborator Author

baldawar commented Aug 9, 2022

Still digging into this one but (1) might be an issue if clients are not checking for duplicate keys in rules similar to Ruler.matches() method

@baldawar
Copy link
Collaborator Author

Proposal for this issue

1/ Add a caveat that Ruler's current APIs follow the default jackson behaviour for deserializting. This means we'll only consider the final value in the document order. Maybe a final note that this is subject to change in future.
2/ For clients that aren't comfortable with this we ask them to pass the JSON-Nodes instead of Strings. This will let them configure de-serialization however they want. We'll need to add a new method from JsonRuleCompiler.java and upwards to allow clients to pass a JsonParser or JsonNode.

This avoids breaking any clients today, and lets them migrate in future. Not expecting any performance hits as we create a parse during rule-compilation https://github.com/aws/event-ruler/blob/main/src/main/software/amazon/event/ruler/JsonRuleCompiler.java#L118

@timbray
Copy link
Collaborator

timbray commented Aug 29, 2022

It may be a bit more complex than that. Ruler has multiple APIs with different approaches. In at least one place it uses Jackson ObjectMapper, which has the behavior you describe. In most places it uses the nextToken() API, which actually means the software will see the duplicate entries, and I forget what it does.

My proposal would be to figure out what it does and document that. I would say that we should reject addRule calls that have duplicate fields in the pattern, except that would be a breaking change for some of the AWS callers. So maybe add an addRuleSafely method that reports syntax errors for duplicate keys.

@longzhang-lz
Copy link
Collaborator

agree with Tim, we shall ensure to reject the duplicated filed name for none array situation, we can enforce our API to allow

{
  "source": ["aws.s3"],
  "source": ["aws.sns"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": [
    { "eventSource": ["s3.amazonaws.com"] },
    { "eventSource": ["sns.amazonaws.com"] }
  ]
}

but reject below

{
  "source": ["aws.s3"],
  "source": ["aws.sns"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["s3.amazonaws.com"],
    "eventSource": ["sns.amazonaws.com"]
  }
}

@longzhang-lz
Copy link
Collaborator

longzhang-lz commented Aug 30, 2022

oh, please ignore my previous reply, I had thought it is for event, I mean for even, we can enforce above what I mentioned.
for rules, we don't support below sort of rule (which means to both of event source need be present in one same array inside event) yet ...

  "detail": [
    { "eventSource": ["s3.amazonaws.com"] },
    { "eventSource": ["sns.amazonaws.com"] }
  ]

@baldawar
Copy link
Collaborator Author

Looks like when we aren't using ObjectMapper, we use JsonParser. Just like ObjectMapper, there's a way to configure the parser to enforce STRICT_DUPLICATE_DETECTION.

We definitely can't change the exiting parser / object-mapper (as it'll break existing users) but just like customers have the option to pass a parsed JSONNode for events (and perform any validation on their side before making the call), wondering if its reasonable to allow customers to pass JsonNode / JsonParser to us for rules as well.

It'll mean adding more addRule(..) / deleteRule(...) interfaces in GenericMachine.java and then update the RulerCompiler.java / JsonRuleCompiler.java to take in the parser as well.

Along with it, I like the idea of adding new methods that enforce stricting cheking by default but I think we'll need a different name than addRuleSafely(...). The current methods are still safe, they just have different behavior when it comes to JSON parsing.

@baldawar
Copy link
Collaborator Author

baldawar commented Aug 30, 2022

thinking more on this, one more option we have is make GenericMachine configurable during construction and allow whichever JSON parsing preferences customers would want via the constructor. We can then tweak our internal mappers / parsers based on their preferences. It'll mean a new constructor, but no new addRule / deleteRule for folks to migrate to (and fewer methods for us to manage keep around).

@timbray
Copy link
Collaborator

timbray commented Aug 30, 2022

So, have a MachineFactory or MachineBuilder class?

@baldawar
Copy link
Collaborator Author

Yes. I need to check which of the two will be easier for customers to adopt. I suspecting MachineFactory but MachineBuilder makes is easier to introduce more configurations in the future.

baldawar added a commit that referenced this issue Sep 6, 2022
…iour

This minor change is related to
#22.
Would like to make sure there's at least something in the document
on the expected behaviour.

In future, this section can be updated with recommendation to
move to better APIs.
baldawar added a commit that referenced this issue Sep 6, 2022
…iour

This minor change is related to
#22.
Would like to make sure there's at least something in the document
on the expected behaviour.

In future, this section can be updated with recommendation to
move to better APIs.

Also updating the pom.xml to make it easier to release changes. Going
forward, we'll enforce version bumps for key changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants