RedPanda consumer example #85

danthegoodman1 · 2023-08-13T16:42:10Z

Example where ingestion is consuming directly from a RedPanda cluster, and batch inserting by some namespace,table key that is used as the partition key.

Include schema validation and caching in example so that we can guarantee schema issues before insert, and have (dyanmically created) quarantine tables that store the bad (raw) rows when we discover one.

Will ping RP team to ask about high velocity inserts to a given table as that might be bad for RP on a single partition.

The text was updated successfully, but these errors were encountered:

danthegoodman1 · 2023-08-13T16:48:01Z

If we are handling large volume inserts, we can have pre-defined schemas that we cache on each consumer, then we make modifications always nullable (well really everything is nullable). Then after nodes cache the new schema locally (some TTL) then they can accept rows with new columns.

We might want a second example for that though.

danthegoodman1 · 2023-08-13T17:49:59Z

Original idea from Alex @ redpanda, then I modified:

Since we only need to know if an existing column changed type (columns can come and go, except for partition columns (unless there are defaults)) then we can just hash the schema JSON and if it's different than the one we have in memory (or we don't have it yet) then we can start a TX with something like CRDB/PG/FDB (serializable) where we read and compare whether it's valid, and if the change is valid update the remote schema (or local if remote already knows about it), then insert.

This way if we detect a change we go to the golden record and compare/update remote/local and verify the schema.

We can do the schema of the whole batch, but this will slow down the entire partition so important to make really high partition count of the gate.

Requirements are:

Serializable transactions
Schema probably doesn't change often (this would harm performance)
High partition count (prevent blocking other tables)

Then quarantine or drop offending rows.

danthegoodman1 · 2023-08-14T21:40:35Z

See #90 , can check if the schema is different. Still need serializable tx in case there are concurrent inserters getting different columns. Probably should still pre-define an initial schema in real life, but can dynamically add columns.

danthegoodman1 added the example How to do something with IceDB label Aug 13, 2023

danthegoodman1 mentioned this issue Aug 13, 2023

Example verifying pre-defined schema on insert #86

Closed

danthegoodman1 mentioned this issue Aug 14, 2023

Schema accumulator returns whether column was new #90

Closed

danthegoodman1 mentioned this issue Aug 12, 2023

Documentation party #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedPanda consumer example #85

RedPanda consumer example #85

danthegoodman1 commented Aug 13, 2023

danthegoodman1 commented Aug 13, 2023

danthegoodman1 commented Aug 13, 2023

danthegoodman1 commented Aug 14, 2023

RedPanda consumer example #85

RedPanda consumer example #85

Comments

danthegoodman1 commented Aug 13, 2023

danthegoodman1 commented Aug 13, 2023

danthegoodman1 commented Aug 13, 2023

danthegoodman1 commented Aug 14, 2023