New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][CDCSOURCE] source with kafka debezium json format #3341
Comments
直接使用 kafka 连接器即可 本身都是 json |
Is your requirement to split the data and write it to different tables? |
@aiwenmo Yes. One Kafka topic may have multiple cdc tables. And need to write into different tables. I also think we can conusme multile Kafka topics corresponding one table case. |
I have actually take a look of Flink CDC and Hudi solutions. But it seems a bit hard to implement a connector from Kafka CDC (somethings I called it as debezium json in Kafka) to Hudi or other databases with my team group. Recently I take some time to practice with Apache Paimon CDC ingestion of Kafka CDC, after that I thought it might a solution for us, as Apache Paimon serveral days ago became a Top Project of Apache graduated from incubation. So I wonder whether you can implement this Kafka CDC source connector or absorbe their implementation of KafkaSyncDatabaseAction and KafkaSyncTableAction or just wrap it into a CDCSOURCE task on Dinky. I know these features may cause a bit code and structure changes, and please at your schedule to think that. |
Do you have the energy to fulfill this requirement? |
Sorry, I do not have resources to implement this feature. |
I am willing to submit a PR. |
Search before asking
Description
目前存量多源数据库总计有几千张表,数量非常多,有对接的 Kafka 采用了CDC 的方式采集增量数据,大部分格式直接是 debezium json 格式,但由于表数量大,一个 Kafka topic 里会有数量不等的表。没有权限直接对接几千个业务库,而且也不是 MySQL,看Dinky 官方给的都是 MySQLCDC,还有 OracleCDC等。
目前要从 Kafka 消费来实现整库同步,一个topic 会有多张表,这种 Kafka source with debezium json format 希望能够作为一个数据源加入。
Currently, the existing multi-source database has thousands of tables in total, a huge number. The connected Kafka uses the CDC method to collect incremental data, and most of the formats are in debezium json format, but due to the large number of tables, a Kafka topic will have an unequal number of tables. There is no permission to directly connect to thousands of business libraries, and it is not MySQL. Dinky's official documents are all MySQLCDC and OracleCDC.
Currently, in order to implement full-database synchronization from Kafka consumption, a topic will have multiple tables. This Kafka source with debezium json format is expected to be added as a data source.
Use case
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: