Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedPanda consumer example #85

Open
danthegoodman1 opened this issue Aug 13, 2023 · 3 comments
Open

RedPanda consumer example #85

danthegoodman1 opened this issue Aug 13, 2023 · 3 comments
Labels
example How to do something with IceDB

Comments

@danthegoodman1
Copy link
Owner

Example where ingestion is consuming directly from a RedPanda cluster, and batch inserting by some namespace,table key that is used as the partition key.

Include schema validation and caching in example so that we can guarantee schema issues before insert, and have (dyanmically created) quarantine tables that store the bad (raw) rows when we discover one.

Will ping RP team to ask about high velocity inserts to a given table as that might be bad for RP on a single partition.

@danthegoodman1 danthegoodman1 added the example How to do something with IceDB label Aug 13, 2023
@danthegoodman1
Copy link
Owner Author

If we are handling large volume inserts, we can have pre-defined schemas that we cache on each consumer, then we make modifications always nullable (well really everything is nullable). Then after nodes cache the new schema locally (some TTL) then they can accept rows with new columns.

We might want a second example for that though.

@danthegoodman1
Copy link
Owner Author

Original idea from Alex @ redpanda, then I modified:

Since we only need to know if an existing column changed type (columns can come and go, except for partition columns (unless there are defaults)) then we can just hash the schema JSON and if it's different than the one we have in memory (or we don't have it yet) then we can start a TX with something like CRDB/PG/FDB (serializable) where we read and compare whether it's valid, and if the change is valid update the remote schema (or local if remote already knows about it), then insert.

This way if we detect a change we go to the golden record and compare/update remote/local and verify the schema.

We can do the schema of the whole batch, but this will slow down the entire partition so important to make really high partition count of the gate.

Requirements are:

  1. Serializable transactions
  2. Schema probably doesn't change often (this would harm performance)
  3. High partition count (prevent blocking other tables)

Then quarantine or drop offending rows.

@danthegoodman1
Copy link
Owner Author

See #90 , can check if the schema is different. Still need serializable tx in case there are concurrent inserters getting different columns. Probably should still pre-define an initial schema in real life, but can dynamically add columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
example How to do something with IceDB
Projects
None yet
Development

No branches or pull requests

1 participant