Roadmap 2023 #26

hustnn · 2023-01-11T12:49:25Z

Welcome to share your ideas on the roadmap. The updated roadmap for Q3 and Q4 are shown below.

Storage

Data cache preload - Q1
feat: Enable data preload write through #189
Object store (S3) support preview - Q2
feat: patch S3 storage basic #260
Object store (S3/TOS/OSS) production ready - Q3 #545
IO scheduler to improve remote read performance - Q3

External Table/Data Lake (project https://github.com/orgs/ByConity/projects/2)

Hive usability improvement (e.g., schema auto inference) - Q3
Hive table should support schema inference #315
Hudi COW and MOR support - Q3
Support reading Hudi table #360
Multi-catalog (Glue/Hive) support - Q3
Support external catalogs #361
Hive query execution improvement - Q3-Q4
Hive 2.0 #550
Hive orc/parquet min-max index pruning #551
RFC: Hive distributed processing #220

Index

Index cache - Q2
perf: In-memory cache for primary key index #209
Inverted index phase 2 - Q4

Runtime

Projection support - Q3
Grace hash join - Q3
Adaptive query scheduling - Q3
Common table expression (CTE) reuse - Q3
Materialized view - Q4
Extract, Load, Transform (ELT) phase 1 - Q3
Asynchronous query execution、query Queue、join spill
Extract, Load, Transform (ELT) phase 2 - Q4
Exchange spill、colocated scheduling、batch execution
Sql UDF support - Q4
ByConity Support UDF #427

Optimizer

CBO statistics auto collection - Q3
SQL plan management (manually creating binding) - Q3

Transaction

Direct insert values in worker - Q3 #546

Enterprise feature

HA with keeper - Q3 #547
Support RBAC (role based access control) - Q3 (project https://github.com/orgs/ByConity/projects/4) #548
Multi-tenant support - Q4
FoundationDB back& restore - Q4

Performance improvement

Part cache lockless scan - Q3
Hybrid part allocation - Q3 #544
Query result cache - Q3 #549
Column min/max for part pruning - Q4

Stability

Query auto forwarding among multiple servers - Q1
RFC: Multi-Server Integration #208
FoundationDB CAS usage improvement - Q1
chore: Improve CAS operation #145
use atomic api for compare and clear #185
Server isolation - Q3
Storage based HA support - Q4
Metrics enhancement for better observability - Q4

Installation

Provide binary install package - Q1
add script to build debian and rpm package #100
Cluster setup up guideline on bare metal machines - Q1
https://github.com/ByConity/ByConity/tree/master/packages

CI

Auto testing script and test guideline for developers - Q1
add testing guideline #206
Enrich CI test suit - Q1

canhld94 · 2023-01-13T02:09:58Z

Some feedback from the community

Usability:

Need better documents for installation in non-containerized environments
Need better documents for ease of use (e.g. storage engines)

Storage:

Should support commons object storage (S3, GCP)

Integration:

Need to ensure compatibility with common DB driver

Team discussions

Support query from data lake (Hudi, Iceberg, Delta); currently we support Hive, and can extend this model to support others.
Indexing services

Feel free to add more.

zhbdesign · 2023-01-13T05:27:23Z

Support delete, update, user permission control stay the same the latest version of clickhouse database

zhbdesign · 2023-01-13T05:58:41Z

Support CREATE Table AS SELECT * syntax

zhbdesign · 2023-01-13T07:14:19Z

sql fingerprint support
Support jdbc facade
insert overwrite is supported

zhbdesign · 2023-01-13T14:17:18Z

Other import support :RocketMQ, MaterializedMySQL, MaterializedPostgreSQL, Flink, Pulsar

LiuYangkuan · 2023-01-14T03:33:36Z

@canhld94 @hustnn

Storage:
Should support commons object storage (S3, GCP)

Make shared metadata of object storage to be compatible with JuiceFS, then we can shared checkpoint of cnch merge tree part with original ClickHouse using DiskLocal through mounted Juicefs.

In some time-travel scan case, we can use latest original ClickHouse which has new feature that ByConity hav't.

zhbdesign · 2023-01-14T08:21:20Z

automatic collection, update and analysis of statistical information

zbtzbtzbt · 2023-01-15T11:44:58Z

support distributed cache for higher query cache hit rate

zhbdesign · 2023-01-19T03:26:42Z

Support JuiceFS

zhbdesign · 2023-01-23T11:22:55Z

Byconity provides an all-in-one package. You can install clickhouse-client，clickhouse-server，clickhouse-worker，tso_server，daemon_manager，and resource_manager all at once with this unified package; You can install some components or specify a component version as required.

zhbdesign · 2023-01-23T14:07:19Z

Support RBAC，Support for SQL driven maintenance

zhbdesign · 2023-01-29T03:18:44Z

Clone table
Create a new table using the same schema and data as the original table.

s7monk · 2023-05-30T05:51:01Z

Can it support Apache Paimon?

hustnn · 2023-06-20T10:58:22Z

Roadmap updating proposal / 路线图更新提案
Proposals includes both english and chinese versions shown below./ 提案包括英文和中文版本

===English version===
We plan to update the roadmap for ByConity's third and fourth quarters. The updates consist of two parts: adding new features and removing or adjusting some old functionalities. The additions come from three sources. The first part includes features and performance requirements that have received high demand from the community. The second part involves functionalities ported from ByConity's old code baseline to the new code baseline (from version 19.x to 21.x). The third part comprises functionality requirements planned by ByConity's research and development team based on identified gaps, data warehouse positioning, and future trends. The updated roadmap is as follows. We use GitHub Projects to manage subtasks and track progress at https://github.com/orgs/ByConity/projects, and some features already have associated projects.

Storage
    Object store (S3) support - Q2
External Table/Data Lake (project https://github.com/orgs/ByConity/projects/2)
    Hive Usability - Q2-Q3
    Hudi COW and MOR support - Q3
    Multi-catalog (Glue/Hive) support - Q3
    Hive query execution improvement - Q3-Q4
    IceBerg support - Q4
Runtime
    Projection support - Q2
    Grace hash join - Q3
    Adaptive query scheduling - Q3
    Common table expression (CTE) reuse - Q3
    Extract, Load, Transform (ELT)
        Asynchronous Query execution、Query Queue、Join Spill - Q3
        Exchange spill、Colocated scheduling、Batch execution - Q4
Optimizer
    CBO statistics auto collection - Q3
    SQL Plan Management (manually creating binding) - Q3
Transaction
    Direct insert values in worker - Q2
    Atomic attach - Q3
    Iterative transaction support - Q3
Enterprise feature
    HA with keeper - Q3
    Multi-tenant support - Q3
    Support RBAC - Q3 (project https://github.com/orgs/ByConity/projects/4)
    Fine grained access control - Q4
LLM-DB
    LLM vector store support - Q4
    Integrate with OpenAI, LangChain and LlamaIndex - Q4
Performance improvement
    Part cache lockless scan - Q3
    Hybrid part allocation - Q3
    IO scheduler - Q3
    Query result cache - Q3 (project https://github.com/orgs/ByConity/projects/3)
    Column statistics for part pruning - Q4
Stability
    Server isolation - Q3
    Metrics enhancement for better observability - Q3

As mentioned above, the roadmap update consists of two parts: adding new features and removing or adjusting some old functionalities. Let's break down these two parts of the update.
The additions are divided into three parts. The first part mainly stems from the community's demands after going open source. For example, the ability to write directly to the worker to reduce server responsibilities and facilitate horizontal scalability. Other improvements include optimizing I/O and enhancing cold read performance. The second part consists of functionalities ported from ByConity's old code baseline, such as projection. The third part encompasses functionality requirements planned by the research and development team based on identified gaps, data warehouse positioning and future trend analysis, such as enhanced data lake capabilities and support for ELT (Extract, Load, Transform).

# Requirement from ByConity community
External Table/Data Lake
    Hive Usability - Q2-Q3
    Hive query execution improvement - Q3-Q4
Transaction
    Direct insert values in worker - Q2
Performance improvement
    IO scheduler - Q3
    Query result cache - Q3
Stability
    Metrics enhancement for better observability - Q3

# Code baseline merge
Runtime
    Projection support - Q2
    Grace hash join - Q3
    Adaptive query scheduling - Q3
    Common table expression (CTE) reuse - Q3
Transaction
    Atomic attach - Q3
    Iterative transaction support - Q3
Enterprise feature
    HA with keeper - Q3
    Multi-tenant support - Q3
Performance improvement
    Part cache lockless scan - Q3
    Hybrid part allocation - Q3
Stability
    Server isolation - Q3
    
# RD planning
External Table/Data Lake
    Hudi MOR support - Q3
Performance improvement
    Query result cache - Q3
Extract, Load, Transform (ELT)
    Asynchronous Query execution、Query Queue、Join Spill - Q3
    Exchange spill、Colocated scheduling、Batch execution - Q4
LLM-DB
    LLM vector store support - Q4
    Integrate with OpenAI, LangChain and LlamaIndex - Q4

With the addition of the aforementioned high-priority features, we have also removed or adjusted some old functionalities. These involve features with unclear requirements and code refactoring. The detailed list is as follows, and we will allocate time to support the removed functionalities.

Storage
    Hudi COW support -（to Q3）
    Delta lake support - Q2 （replan）
    IceBerg support -（to Q4）
Index
    Space-filling curves - Q1（replan）
    Index auto recommendation - Q2（replan）
Performance
    Column statistics for part pruning -（to Q4）
    Hybrid part allocation -（to Q3）
    Query result cache - Q3
Stability
    Server isolation - （to Q3）
    Metrics enhancement for better observability - (to Q3)
Enterprise feature
    Support RBAC -（to Q3）
    Fine grained access control -（to Q4）
    Backup and recover - Q2 （replan）
Transaction
    Direct write in worker - (to Q3)
    Iterative transaction support - (to Q3)
    Atomic attach - (to Q3)
    Code refactoring - (replan)

Towards the end of each quarter, we conduct a review and fine-tune the content for the following quarter. We synchronize these adjustments with the community and welcome discussions, comments, and new feature requests. The finalized roadmap will be updated after the first week of each quarter and any newly proposed feature requests will be considered for the subsequent quarter.

===中文版===
我们计划对ByConity 第3和第4季度的路线图进行更新。更新包括2部分，第一部分是新增加了一些功能，第二部分是移除和调整了一部分旧的功能。新增内容来源于3块，第一块来源于社区呼声比较高的功能和性能需求，第二块分来自从ByConity旧基线移植到ByConity新基线的功能(19.x to 21.x)，第三块是ByConity研发根据功能短板，数仓定位和对未来趋势判断规划的功能需求。更新之后的路线图如下所示。我们使用github project来管理子任务和追踪进度https://github.com/orgs/ByConity/projects, 部分功能已创建project。

Storage
    Object store (S3) support - Q2
External Table/Data Lake (project https://github.com/orgs/ByConity/projects/2)
    Hive Usability - Q2-Q3
    Hudi COW and MOR support - Q3
    Multi-catalog (Glue/Hive) support - Q3
    Hive query execution improvement - Q3-Q4
    IceBerg support - Q4
Runtime
    Projection support - Q2
    Grace hash join - Q3
    Adaptive query scheduling - Q3
    Common table expression (CTE) reuse - Q3
    Extract, Load, Transform (ELT)
        Asynchronous Query execution、Query Queue、Join Spill - Q3
        Exchange spill、Colocated scheduling、Batch execution - Q4
Optimizer
    CBO statistics auto collection - Q3
    SQL Plan Management (manually creating binding) - Q3
Transaction
    Direct insert values in worker - Q2
    Atomic attach - Q3
    Iterative transaction support - Q3
Enterprise feature
    HA with keeper - Q3
    Multi-tenant support - Q3
    Support RBAC - Q3 (project https://github.com/orgs/ByConity/projects/4)
    Fine grained access control - Q4
LLM-DB
    LLM vector store support - Q4
    Integrate with OpenAI, LangChain and LlamaIndex - Q4
Performance improvement
    Part cache lockless scan - Q3
    Hybrid part allocation - Q3
    IO scheduler - Q3
    Query result cache - Q3 (project https://github.com/orgs/ByConity/projects/3)
    Column statistics for part pruning - Q4
Stability
    Server isolation - Q3
    Metrics enhancement for better observability - Q3

如上所述，路线图的更新包括2部分，一部分是新增功能，一部分是移除和调整了部分旧的功能，这里对这2部分更新进行拆解。
新增内容由3部分组成，第一部分主要来自开源之后社区的需求，例如能够直写worker，降低server负责，使得写入易于水平扩展。例如通过优化IO，提升冷读性能等等。第二部分是从ByConity的旧的代码基线移植过来的功能，例如projection。第三部分是研发根据数仓定位和对未来趋势判断规划的功能需求，例如数据湖的增强，ELT的支持等等。

# ByConity开源社区需求
External Table/Data Lake
    Hive Usability - Q2-Q3
    Hive query execution improvement - Q3-Q4
Transaction
    Direct insert values in worker - Q2
Performance improvement
    IO scheduler - Q3
    Query result cache - Q3
Stability
    Metrics enhancement for better observability - Q3

# 代码基线合并
Runtime
    Projection support - Q2
    Grace hash join - Q3
    Adaptive query scheduling - Q3
    Common table expression (CTE) reuse - Q3
Transaction
    Atomic attach - Q3
    Iterative transaction support - Q3
Enterprise feature
    HA with keeper - Q3
    Multi-tenant support - Q3
Performance improvement
    Part cache lockless scan - Q3
    Hybrid part allocation - Q3
Stability
    Server isolation - Q3
    
# 研发规划
External Table/Data Lake
    Hudi MOR support - Q3
Performance improvement
    Query result cache - Q3
Extract, Load, Transform (ELT)
    Asynchronous Query execution、Query Queue、Join Spill - Q3
    Exchange spill、Colocated scheduling、Batch execution - Q4
LLM-DB
    LLM vector store support - Q4
    Integrate with OpenAI, LangChain and LlamaIndex - Q4

由于新增了上述高优的功能，我们也移除和调整了部分旧的功能，这部分功能涉及到一些需求不明确的功能和代码重构，详细列表如下所示，对移除的功能会重新安排时间去支持。

Storage
    Hudi COW support -（to Q3）
    Delta lake support - Q2 （replan）
    IceBerg support -  (to Q4）
Index
    Space-filling curves - Q1（replan）
    Index auto recommendation - Q2（replan）
Performance
    Column statistics for part pruning -（to Q4）
    Hybrid part allocation -（to Q3）
    Query result cache - Q3
Stability
    Server isolation - （to Q3）
    Metrics enhancement for better observability - (to Q3)
Enterprise feature
    Support RBAC -（to Q3）
    Fine grained access control -（to Q4）
    Backup and recover - Q2 （replan）
Transaction
    Direct write in worker - (to Q3)
    Iterative transaction support - (to Q3)
    Atomic attach - (to Q3)
    Code refactoring - (replan)

我们每个季度临近结束的时候都会进行一次review和并对后续季度的内容进行微调，并把调整同步到社区，欢迎大家讨论评论和提新的功能需求，并在每个季度的第一周结束之后进行定稿，然后更新社区路线图。定稿之后提的新功能需求会顺延到下一个季度。

zhaojintaozhao · 2023-07-04T03:51:22Z

MetaData Backup and recover

If the metadata kv FoundationDB is broken (for example, disk broken or logic fault cause FDB broken) and metadata is lost, is there any method to restore data?Therefore, I suggest adding the metadata backup and recovery function, which I hope to move to Q3 plan.

zhaojintaozhao · 2023-07-04T04:26:17Z

Extract, Load, Transform (ELT)
    Asynchronous Query execution、Query Queue、Join Spill - Q3
    Exchange spill、Colocated scheduling、Batch execution - Q4

The ByConity support ELT feature is an exciting feature and a valuable feature.
After this feature is implemented, ByConity can support more complex Hive SQL statements and SQL queries with a larger data volume.
The shuffle of Cnch Hive SQL will be a complex feature that requires a lot of effort.

juppylm · 2023-07-04T09:34:14Z

Is inverted index also supported? In addition to the good performance of the primary key index in the current index, when querying non-primary key fields, the performance needs to be improved.

juppylm · 2023-07-05T01:22:37Z

CnchMergeTree support materialized view.
Materialized view is an important feature of clickhouse, and I think CnchMergeTree should also support it.

FourSpaces · 2023-07-05T02:25:16Z

I hope to support joint query engines for multiple tables and multiple data sources, pushing queries down to their respective data sources for querying. The joint query engine consolidates the data from each data source.

Merge Table Engine similar to clickhouse

zhaojintaozhao · 2023-07-06T12:33:26Z

Build Objectives

In data warehouse scenario, ByConity should connect to the general data warehouse system, complete features and improve performance.
In OLAP scenario, we will build ByConity to a comprehensive cloud-native database for multi-dimensional analysis. The performance of large-width tables is close to that of ClickHouse. This cloud-native bigdata database will support elastic scaling, resource isolation, high reliability. ByConity will support large commercial deploy.

Requirements

1. Data warehouse scenario

In data warehouse scenarios, we focus on scenario coverage improvement and support complex query of large tables in data warehouses.

Key Features

1.1. Multi-stage execution and ETL capabilities at the execution layer, batch processing and exchange shufle are supported, and complex SQL statements for querying large tables in the data warehouse are supported. Key Fature
1.2. Support Special Hive functions and Hive UDFs. Key Feature
1.3. Performance improment (for eg: ORC/Parquet Native Reader, block cache, min-max index, etc.)

General Features

1.4. Support multi external catalog;
1.5. Automatically infer foreigh hive table column type.
1.6. Orc/Parquet file min-max index;
1.7. Read Orc/Parquet data files in distributed mode with thread pool.
1.8. Schedule multiple worker/worker-group workloads, make full use of resources.
1.9. Support Iceberg\Hudi.

2. OLAP Scenario

we focus on reliability and performance improvement in the OLAP scenario
2.1. Projection
2.2. Part's detach\attach
2.3. Automatic re-collection of full CBO statistics
2.4. Automatic collection of CBO statistics incremental data
2.5. Cache of min and max information of a part and query pruning.
2.6. Multi-disk local data cache
2.7. Seperate Primary index cache, mark cache, and data cache of CnchMergeTree.
2.8. Cluster management in containerized and non-containerized scenarios (adding and deleting workers\worker-group\virtual house)'.

3. Reliability enhancement

3.1. Metadata backup and restoration capabilities Key Feature
3.2. The ByConity service monitoring and alarms
3.3. Monitoring, alarm, and capacity expansion of FoundationDB
3.4. Multi-server\tso host selection and HA
3.5. Multi-instance and HA of RM\DM

What we foucs feature is 1.1, 1.2 and 3.1
@hustnn

目标

数仓场景：对接通用数仓体系，补齐功能提性能。
OLAP场景：构建完善的云原生数据库，大宽表性能接近ClickHouse，支持弹性伸缩、资源隔离和高可靠，支持大规模商用部署；

需求

一、数仓场景

数仓场景关注场景覆盖率提升和支持数仓的大表复杂查询

关键特性

1.1. 执行层的多Stage执行、ETL能力，支持batch processing和exchange shufle;
1.2. 支持Hive的特殊函数、Hive UDF;
1.3. 性能加速(ORC/Parquet Native Reader、block cache、index等等)

通用特性

1.4. Multi External catalog；
1.5. 外表类型自动推断；
1.6. Orc/parquet min-max index；
1.7. 数据分布式读取；
1.8. 对多个worker\worker-group负载的调度，充分利用资源
1.9. 支持Iceberg\Hudi

二、Olap多维分析场景

Olap场景关注可靠性和性能提升
2.1. Projection
2.2. Part的detach\attach
2.3. CBO统计全量数据的自动重新收集
2.4. CBO统计增量数据的自动收集
2.5. Part的min\max等信息的cache和查询剪枝
2.6. 多磁盘cache
2.7. Part的primary index cache、mark cache和data cache
2.8. 容器化、非容器化下的集群管理(增加删除worker\worker-group\virtual house)

三、可靠性增强

3.1. 元数据具备备份和恢复能力
3.2. ByConity的服务监控、告警
3.3. 元数据FDB的监控、告警和扩容
3.4. 多server\tso的选主和HA
3.5. RM\DM的多实例和HA
最关注的特性是1.1， 1.2 和 3.1

hustnn · 2023-07-11T06:33:45Z

shijiaoming · 2023-09-06T07:41:30Z

Support Apache Paimon，it‘s very cool Stream Data Lake！！！

kevinthfang · 2023-11-02T09:37:07Z

Updates on roadmap:
reduced:

Atomic attach - Q4
Iterative transaction support (support multiple inserts atomic) - Q4

added:

Storage based HA support - Q4

hustnn pinned this issue Jan 11, 2023

hustnn mentioned this issue Jan 30, 2023

S3Disk support as main storage layer. #30

Closed

Adora627 added the enhancement New feature or request label Feb 17, 2023

hustnn changed the title ~~Roadmap 2023 (discussion)~~ Roadmap 2023 (Q3-Q4) Jul 11, 2023

hustnn changed the title ~~Roadmap 2023 (Q3-Q4)~~ Roadmap 2023 Jul 12, 2023

blueskygzhz mentioned this issue Oct 12, 2023

byconity server deadlock when do heavy benchmark test #787

Closed

ixnzh mentioned this issue Mar 4, 2024

Roadmap 2024 #1265

Open

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap 2023 #26

Roadmap 2023 #26

hustnn commented Jan 11, 2023 •

edited by kevinthfang

canhld94 commented Jan 13, 2023 •

edited

zhbdesign commented Jan 13, 2023

zhbdesign commented Jan 13, 2023

zhbdesign commented Jan 13, 2023

zhbdesign commented Jan 13, 2023

LiuYangkuan commented Jan 14, 2023 •

edited

zhbdesign commented Jan 14, 2023

zbtzbtzbt commented Jan 15, 2023

zhbdesign commented Jan 19, 2023

zhbdesign commented Jan 23, 2023

zhbdesign commented Jan 23, 2023

zhbdesign commented Jan 29, 2023

s7monk commented May 30, 2023

hustnn commented Jun 20, 2023 •

edited

zhaojintaozhao commented Jul 4, 2023 •

edited

zhaojintaozhao commented Jul 4, 2023

juppylm commented Jul 4, 2023

juppylm commented Jul 5, 2023

FourSpaces commented Jul 5, 2023 •

edited

zhaojintaozhao commented Jul 6, 2023

hustnn commented Jul 11, 2023

shijiaoming commented Sep 6, 2023

kevinthfang commented Nov 2, 2023 •

edited

Roadmap 2023 #26

Roadmap 2023 #26

Comments

hustnn commented Jan 11, 2023 • edited by kevinthfang

Storage

External Table/Data Lake (project https://github.com/orgs/ByConity/projects/2)

Index

Runtime

Optimizer

Transaction

Enterprise feature

Performance improvement

Stability

Installation

CI

canhld94 commented Jan 13, 2023 • edited

Some feedback from the community

Team discussions

zhbdesign commented Jan 13, 2023

zhbdesign commented Jan 13, 2023

zhbdesign commented Jan 13, 2023

zhbdesign commented Jan 13, 2023

LiuYangkuan commented Jan 14, 2023 • edited

zhbdesign commented Jan 14, 2023

zbtzbtzbt commented Jan 15, 2023

zhbdesign commented Jan 19, 2023

zhbdesign commented Jan 23, 2023

zhbdesign commented Jan 23, 2023

zhbdesign commented Jan 29, 2023

s7monk commented May 30, 2023

hustnn commented Jun 20, 2023 • edited

zhaojintaozhao commented Jul 4, 2023 • edited

zhaojintaozhao commented Jul 4, 2023

juppylm commented Jul 4, 2023

juppylm commented Jul 5, 2023

FourSpaces commented Jul 5, 2023 • edited

zhaojintaozhao commented Jul 6, 2023

Build Objectives

Requirements

1. Data warehouse scenario

Key Features

General Features

2. OLAP Scenario

3. Reliability enhancement

目标

需求

一、数仓场景

关键特性

通用特性

二、Olap多维分析场景

三、可靠性增强

hustnn commented Jul 11, 2023

Storage

Index

Stability

Installation

CI

shijiaoming commented Sep 6, 2023

kevinthfang commented Nov 2, 2023 • edited

hustnn commented Jan 11, 2023 •

edited by kevinthfang

canhld94 commented Jan 13, 2023 •

edited

LiuYangkuan commented Jan 14, 2023 •

edited

hustnn commented Jun 20, 2023 •

edited

zhaojintaozhao commented Jul 4, 2023 •

edited

FourSpaces commented Jul 5, 2023 •

edited

kevinthfang commented Nov 2, 2023 •

edited