Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoadMap of IceLake v0.1 #1

Open
Xuanwo opened this issue Jun 15, 2023 · 10 comments
Open

RoadMap of IceLake v0.1 #1

Xuanwo opened this issue Jun 15, 2023 · 10 comments

Comments

@Xuanwo
Copy link
Contributor

Xuanwo commented Jun 15, 2023

Iceberg is an open table format designed for analytic datasets. However, the lack of a mature Rust binding for Iceberg makes it difficult to integrate with databases like Databend.

IceLake intends to fill this gap. By developing icelake, I expect to build up an open ecosystem that:

  • Users can read/write iceberg table from ANY storage services like s3, gcs, azblob, hdfs and so on.
  • ANY Databases can integrate with icelake to facilitate reading and writing of iceberg tables.
  • Provides NATIVE support transmute between arrow
  • Provides bindings so that other language can operate on iceberg tables powered by rust core.

For IceLake v0.1, I expect to implement the following features:

  • Setup the project layout and build development loop so that the community can take part in.
  • Support reading data for iceberg v2 from storage services (only limited file formats will be supported).
  • Evaluate our design by integrating it with databend.

This project is sponsored by Databend Labs

@Xuanwo Xuanwo changed the title Readmap of IceLake v0.1 RoadMap of IceLake v0.1 Jun 15, 2023
@Xuanwo Xuanwo pinned this issue Jun 15, 2023
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jun 17, 2023

Updates on 2023-06-17:

We have released version 0.0.1, which includes all the necessary types. Our next step is to integrate with databend to ensure we are proceeding in the right direction.

cc our sponsors, FYI: @BohuTANG, @sundy-li, @flaneur2020, @ZhiHanZ

@shaeqahmed
Copy link

Very excited for this!

@Ted-Jiang
Copy link

Looking forward !

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jun 29, 2023

Updates on 2023-06-29:

IceLake is almost functional on Databend now: datafuselabs/databend#11923

I am currently working on resolving some issues with reading Parquet files in Databend. However, I am confident that I can address these issues within the next two days.

Once we successfully test our initial proof of concept for reading, we will release version 0.1 and clean up our code. We will also add more documentation to enable our community to participate.

cc our sponsors, FYI: @BohuTANG, @sundy-li, @flaneur2020, @ZhiHanZ

@huang12zheng
Copy link

This document is the result of my study

ABOUT icelake

a example for icelake entrypoint

source from examples/read_iceberg_table.rs

  • In
  1. a direction about iceberg

let table_uri = format!("{}/testdata/simple_table",env::current_dir())

  • Output

let table = Table::open(table_uri.as_str()).await?;

get ArrowSchema from icelake::in_memory::Schema

  • In

let schema = types::Schema {..}

  • Output

let arrow_schema = ArrowSchema::try_from(schema).unwrap();

ABOUT parquet feature

  • what you cloud use is
  1. ParquetWriterBuilder

inner is opendal::Writer
it is also need with arrow_schema

let op = Operator::new(Memory::default())?.finish();
let w = op.writer("test").await?;
// ...
let mut pw = ParquetWriterBuilder::new(w, to_write.schema()).build()?;
// pw.write(&to_write).await?;
  1. ParquetStreamBuilder

inner is opendal::Reader

let op = Operator::new(Memory::default())?.finish();
let r = op.reader("test").await?;
let mut reader = ParquetStreamBuilder::new(r).build().await?;
let res = reader.next().await.unwrap()?;

@RinChanNOWWW
Copy link
Contributor

Does icelake intend to support higher level Iceberg operations?

Such as:

  • Read arrow RecordBatch from parquet file after schema evolution. (Ignore deleted columns and append newly added columns)
  • Combine files of different types (data, position deletes, equality deletes).

Or icelake is just a base lib for Iceberg format, all the high-level operations should be implemented by the application?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Aug 8, 2023

Yes, we should cover those high level operations.

@sanderpick
Copy link

Heyo! I'm curious, what is the difference between this project and https://github.com/apache/iceberg-rust? I see it's being developed by the same team, but both are stated as "rust implementation of iceberg".

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 9, 2024

Heyo! I'm curious, what is the difference between this project and https://github.com/apache/iceberg-rust? I see it's being developed by the same team, but both are stated as "rust implementation of iceberg".

We began with icelake as a Rust implementation of Iceberg, but we later shifted our focus to direct contributions upstream. Now, icelake serves primarily as a staging area to test our concepts and ensure compatibility with existing applications. Ultimately, icelake will be integrated into iceberg-rust.

@sanderpick
Copy link

makes sense! i'm excited for iceberg-rust and hope to make some contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants