Skip to content

Releases: lakesoul-io/LakeSoul

v2.5.4

23 May 06:27
Compare
Choose a tag to compare
  1. Fix class shading in lakesoul common

v2.5.3

29 Mar 08:19
Compare
Choose a tag to compare
  1. Add shaded packages for release
  2. Fix compaction may write to incorrect partition

v2.5.1

29 Jan 10:45
Compare
Choose a tag to compare
  1. Fix Flink sink parallelism for non-primary key table;
  2. Fix native io filter for non-ascii names and nested columns;
  3. Optimize compaction performance.

v2.5.0 & Python 1.0.0b1

10 Jan 04:56
Compare
Choose a tag to compare

LakeSoul 2.5.0 Release Note

What's New

  1. Python Reader supports PyTorch, PyArrow, Pandas, Ray, and distributed execution;
  2. Support Spark Gluten Vectorized Engine;
  3. Spark SQL supports Compaction, Rollback and other Call Procedures;
  4. Flink CDC’s entire database synchronization supports MySQL, PostgreSQL, PolarDB, and Oracle;
  5. Support streaming and batch export to MySQL, PostgreSQL, PolarDB, and Apache Doris;
  6. Optimized NativeIO performance.

更新内容

  1. Python Reader 支持 PyTorch、PyArrow、Pandas、Ray,支持分布式执行;
  2. 支持 Spark Gluten Vectorized Engine;
  3. Spark SQL 支持 Compaction、Rollback 等 Call Procedures;
  4. Flink CDC 整库同步支持 MySQL、PostgreSQL、PolarDB、Oracle;
  5. 支持流式、批式出湖至 MySQL、PostgreSQL、PolarDB、Apache Doris;
  6. 优化 NativeIO 性能.

What's Changed

New Contributors

Full Changelog: v2.4.1...v2.5.0

Release v2.4.1

12 Oct 07:14
Compare
Choose a tag to compare

What's Changed

  • [Flink] Flink can configure global warehouse dir by @F-PHantam in #342
  • [NativeIO] Implement DataFusion TableProvider by @Ceng23333 in #341
  • [Spark]Spark parquet filter pushdown exactly by @Ceng23333 in #343
  • [Spark]Spark parquet filter pushdown evaluation + bugfix by @Ceng23333 in #344
  • [Meta] fix meta field compatibility in partition info table by @xuchen-plus in #345
  • [Common] Cleanup redundant DataOperation by @Ceng23333 in #346
  • [Docs] add kyuubi with lakesoul setup doc. by @Asakiny in #348
  • [Native-Metadata] Adaptive jnr buffer size by @Ceng23333 in #347
  • [NativeIO][Bug] LakeSoulParquetProvider projection bugfix by @Ceng23333 in #349
  • [NativeIO] Enable parquet prefetch & use stable sort by @xuchen-plus in #350

Full Changelog: v2.4.0...v2.4.1

LakeSoul Release v2.4.0 and Python 1.0 Beta

21 Sep 09:16
Compare
Choose a tag to compare

What's New In This Release

  1. RBAC support for all query engines. doc
  2. Auto cleaning of old compaction data and partition TTL. doc
  3. Upgrade Flink version to 1.17 and support row level update/delete in batch sql.
  4. Optimize whole database Flink cdc sync throughput by 80%: #307
  5. Presto Reader; doc
  6. Python reader and integration with PyTorch and HuggingFace. doc

本次更新内容

  1. 支持 RBAC 角色权限控制,对所有引擎、所有语言API均有效;文档
  2. 自动清理旧的 compaction 数据,支持分区级生命周期(TTL);文档
  3. 升级 Flink 版本到 1.17,并支持批模式下行级别更新和删除;
  4. 优化整库同步 Flink 作业,吞吐提升 80%: #307
  5. 支持 Presto 读取;文档
  6. 支持原生 Python 读取,提供 PyTorch、HuggingFace 的集成。文档

What's Changed

Full Changelog: https://github.com/lakesoul-io/LakeSoul/commits/v2.4.0

LakeSoul Release v2.3.1

22 Aug 02:40
Compare
Choose a tag to compare
  • Fix jackson-core packaging for Flink package
  • Fix commons-lang class missing
  • Fix snapshot rollback/cleanup with local timezone

LakeSoul Release v2.3.0

13 Jul 09:44
Compare
Choose a tag to compare

v2.3.0 Release Notes

This is the first release after LakeSoul donated to Linux Foundation AI & Data. This release contains the following major new features:

  1. Flink Connector for Flink SQL/Table API to read or write LakeSoul in both batch and streaming mode, with the supports of Flink Changelog Stream semantics and row-level upsert and delete. See docs Flink Connector.
  2. Flink CDC Ingestion refactored to infer new tables and schema changes automatically from messages. This enables simpler CDC stream ingestion job development for any kinds of database or message queues.
  3. Global automatic compaction service. See docs Auto Compaction Service.

更新日志

这是 LakeSoul 捐赠给 Linux Foundation AI & Data 后的第一个发布版本。该版本包含以下重要更新:

  1. 全面支持 Flink SQL/Table API. LakeSoul 支持 Flink 流、批读写。流式读写完整支持 Flink Changelog 语义,支持行级别流式增删改。参考文档
  2. Flink CDC 整库同步重构,支持从消息中自动推断新表和 schema 变更。能够更简单的开发 CDC 入湖作业并支持消费任意数据库 CDC 流或消息队列流。
  3. 全局自动 Compaction 服务。参考文档:LakeSoul 全局自动压缩服务使用方法

What's Changed

v2.2.0

31 Mar 08:33
Compare
Choose a tag to compare

LakeSoul Release v2.2.0

v2.2.0 Release Notes

  1. Native IO is by default enabled for Flink CDC Sink and Spark SQL. Native IO uses arrow-rs and Datafusion with special IO optimizations based on arrow-rs' object store. Benchmarks show 3x IO throughput improvement over parquet-mr and Hadoop filesystem. Native IO supports both HDFS and S3 object storage (including S3 protocol compatible storages). Native IO supports all data types in Spark and Flink and has passed both TPC-H and CHBenchmark correctness tests.
  2. Snapshot read and incremental read support on Spark. LakeSoul's incremental read on spark supports both batch mode and microbatch streaming mode.
  3. Default supported Spark's version has been upgraded to Spark 3.3.

v2.2.0 发布日志

  1. Native IO 在 Flink 和 Spark 上默认启用。Native IO 使用 arrow-rs 和 [Datafusion] (https://github.com/apache/arrow-datafusion) 实现,并在 arrow-rs object store 上做了专门的性能优化。在实际测试中比 parquet-mr+hadoop filesystem 快 3 倍以上。Native IO 可以支持 HDFS 和 S3 存储,以及与 S3 兼容的存储系统。Native IO 经过了详细的测试,能够支持 Flink、Spark 所有数据类型,并通过了 TPC-H 和 CHBenchmark 的正确性校验。
  2. 在 Spark 上支持了快照读增量读功能。增量读功能可以支持 batch 模式和 micro batch streaming 模式。
  3. 默认的 Spark 版本更新到 3.3.

What's Changed

Full Changelog: https://github.com/meta-soul/LakeSoul/commits/v2.2.0

v2.1.1

18 Oct 05:45
Compare
Choose a tag to compare

What's Changed

This is a bug fix release for v2.1.0.

Fixed bugs:

Full Changelog: 2.1.0...v2.1.1