New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement append only task writer. #74
Comments
How did this api used to by compute engines? Such as:
|
Let's prioritize making it functional for now and refine the API later. |
Let's use risingwave's new coordinated sink as an example:
|
This issue can be close now |
We need to add support for partition spec. cc @ZENOTME |
I realize that in future we need to add position delete file writer, and the user can use like following (use different writer seperately):
So maybe name the interface be append_writer() will be better? |
In my original design, the task writer should provides two methods:
The internal implementation of |
An append-only task writer accepts an optional partitioner, file appender factory as arguments. When it receives records, it dispatches records to different file writer according to the partition key(generated by partitioner), and inserts it. When it finished, it returns generated data file structs.
Notice that this will be the api used directly by compute engines such as risingwave, ballista. We can refer to following implementation as an example.
https://github.com/apache/iceberg/blob/e340ad5be04e902398c576f431810c3dfa4fe717/core/src/main/java/org/apache/iceberg/io/PartitionedFanoutWriter.java#L28
The text was updated successfully, but these errors were encountered: