-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][CDCSOURCE] source with kafka debezium json format #3341
Comments
直接使用 kafka 连接器即可 本身都是 json |
Is your requirement to split the data and write it to different tables? |
@aiwenmo Yes. One Kafka topic may have multiple cdc tables. And need to write into different tables. I also think we can conusme multile Kafka topics corresponding one table case. |
I have actually take a look of Flink CDC and Hudi solutions. But it seems a bit hard to implement a connector from Kafka CDC (somethings I called it as debezium json in Kafka) to Hudi or other databases with my team group. Recently I take some time to practice with Apache Paimon CDC ingestion of Kafka CDC, after that I thought it might a solution for us, as Apache Paimon serveral days ago became a Top Project of Apache graduated from incubation. So I wonder whether you can implement this Kafka CDC source connector or absorbe their implementation of KafkaSyncDatabaseAction and KafkaSyncTableAction or just wrap it into a CDCSOURCE task on Dinky. I know these features may cause a bit code and structure changes, and please at your schedule to think that. |
Do you have the energy to fulfill this requirement? |
Sorry, I do not have resources to implement this feature. |
I am willing to submit a PR. |
只需要使用dinky flink jar的方法调用 paimon action 来做kafka cdc, 无需做单独的source的;但目前的问题事paimon 0.8的版本依然没解决debezium json的自动寻找主键的问题,debezium json的格式是不带主键信息的,无法识别主键,paimon无法自动建表,从paimon的issue看,有人实现了从kafka connect 里生成的key去取主键,暂时没有merge到master的branch里,需要等待0.9版本了;但目前dinky的datastream-kafka生成到kafka信息只有value,没有key,这也不会被paimon cdc所识别 |
Search before asking
Description
目前存量多源数据库总计有几千张表,数量非常多,有对接的 Kafka 采用了CDC 的方式采集增量数据,大部分格式直接是 debezium json 格式,但由于表数量大,一个 Kafka topic 里会有数量不等的表。没有权限直接对接几千个业务库,而且也不是 MySQL,看Dinky 官方给的都是 MySQLCDC,还有 OracleCDC等。
目前要从 Kafka 消费来实现整库同步,一个topic 会有多张表,这种 Kafka source with debezium json format 希望能够作为一个数据源加入。
Currently, the existing multi-source database has thousands of tables in total, a huge number. The connected Kafka uses the CDC method to collect incremental data, and most of the formats are in debezium json format, but due to the large number of tables, a Kafka topic will have an unequal number of tables. There is no permission to directly connect to thousands of business libraries, and it is not MySQL. Dinky's official documents are all MySQLCDC and OracleCDC.
Currently, in order to implement full-database synchronization from Kafka consumption, a topic will have multiple tables. This Kafka source with debezium json format is expected to be added as a data source.
Use case
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: