Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support events in AVRO and other formats by supporting jackson-dataformat-binary #36

Open
aymkhalil opened this issue Sep 7, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@aymkhalil
Copy link

What is your idea?

Support pattern matching on AVRO events. AVRO support is a reasonable next step because:

  1. It has wide adoption in streaming systems like Pulsar and Kafka. Any streaming system could use not only the performance characteristics of ruler (which are much needed) but also the semantics of pattern matching.
  2. Would serve as a good reference example/stepping stone implementation for other binary formats (or any events with formal schema for that matter).

Would you be willing to make the change?

Maybe

Additional context

  • Message/streaming systems are lacking a killer pattern/expression/filter language - it could use a de facto "ruler pattern" language, just like SQL is for DBs.

  • User who choose to define schemas for their events, expect all interactions to respect schema. Having pattern matching respect data types and fields as defined by the "active or a previous" schema seems like a natural fit for those use cases.

@aymkhalil aymkhalil added the enhancement New feature or request label Sep 7, 2022
@timbray
Copy link
Collaborator

timbray commented Sep 7, 2022

This is a great idea. The thing to watch out for is the performance of turning Avro messages into field name/value pairs, in the case of Jackson this is the most expensive part of the matching task. Note the comments on https://www.tbray.org/ongoing/When/202x/2021/12/03/Filtering-Lessons where people say that it should be possible to do this very efficiently. Ideally it would be nice to have a Jackson-compatible tokenizer, which would allow for a lot of code re-use. Hey, check this out: https://github.com/FasterXML/jackson-dataformats-binary - this could be the basis for killing a lot of birds with one stone.

@aymkhalil
Copy link
Author

Thanks @timbray for the implementation hints! It is promising to see the possibility to add binary format support without compromising performance.

@baldawar baldawar changed the title Support AVRO events Support AVRO events by supporting jackson-dataformat-binary Sep 7, 2022
@baldawar
Copy link
Collaborator

baldawar commented Sep 7, 2022

I like the idea of supporting jackson-dataformats-binary. Updated the title to highlight this. Code-change wise, I think this'll mean a new method in Event.java that creates a JsonParser [1] for the right format. We then bubble up the interface from various places clients can use (GenericMatchine, Ruler). Then we need to also add tests cases + benchmarks.

@baldawar baldawar changed the title Support AVRO events by supporting jackson-dataformat-binary Support events in AVRO and other formats by supporting jackson-dataformat-binary Sep 7, 2022
@timbray
Copy link
Collaborator

timbray commented Sep 7, 2022

Just need to be sure that this supports the nextToken() API, not just ObjectMapper. Also need to watch out for schemas, most of these binary formats can't be parsed without accessing schemas. Will need to cache schemas where possible. CBOR doesn't need a schema. Avro relies on a 4-byte field in the Kafka wire format header that identifies the schema. Which is to say, this feature is going to need some design thinking, even if the implementation isn't that hard.

@baldawar
Copy link
Collaborator

baldawar commented Sep 9, 2022

https://github.com/FasterXML/jackson-dataformats-binary seems to be extending JSONParser, so it "should" be supporting nextToken() but definitely worth a second look. There's some tests in the pkg implementing, so hopeful. I hadn't thought beyond this yet.

I find the idea of extending a Flattener interface from Quamina https://github.com/timbray/quamina/#flattening-and-matching. I'm hoping there's a similar interface we can have to allows for extension. This interface should be in addition for built-in support for Avro and other schemas.

@baldawar
Copy link
Collaborator

baldawar commented Sep 9, 2022

Also 👋 Ayman, missed ya.

@timbray
Copy link
Collaborator

timbray commented Sep 9, 2022

Just had a look at the Jackson Avro code. I was worried that it would implement nextToken() by deserializing into an object then traversing that, but it looks pretty efficient actually. Also, it looks pretty complicated, wouldn't want to implement one of these from scratch.

Also, hey there Ayman.

@aymkhalil
Copy link
Author

Seems nextToken() has huge code reusability advantage. I was wondering though if it limits the ability to 1) consult the schema if the pattern field exist in the event and 2) look it up in constant time). The O(1) lookup part should be doable in protobuf - not sure about Avro. This thing maybe beneficial when Pattern fields are << Event fields OR just premature optimization  ¯_(ツ)_/¯

fwiw I was actually doing research on event matching options for Pulsar - I stumbled upon few options like JMS selectors, JSTL, and even fully fledged SQL "WHERE" conditions! At the time, I wanted to also experiment with Ruler as it would beat other options performance-wise at least (but the Java version was not OSS. And, the non-aws community is very schema-full). SOOO, big thank you everyone behind this initiative!

Also, 👋 Rishi, 👋 Tim! Hope all is well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants