New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres-cdc: the size of WAL for a replication slot accumulated for a long time #16697
Comments
Isn't it already solved by the heartbeat message? |
This sounds like the most likely cause. I am suspicious whether the customer mistake something. |
I am trying to reproduce the issue, it seems AWS RDS would generate WALs behind the scene:
|
I have reproduced the issue and confirmed it is caused by AWS generating writes into its hidden database. More details can be found in this blog. solution: user can configure the
|
@StrikeW according to this post, it looks like recent versions (v42.7.0+) of the PgJDBC driver may have resolved this issue (pgjdbc/pgjdbc#2941). Is it possible the version RW depends on needs to be updated? |
Upgrade the JDBC driver may introduce other issues as pointed out in pgjdbc/pgjdbc#3138, I suggest we should leverage the |
Ohhh good to know! @StrikeW is that something we should do in our own source configuration, or do you plan to do it globally across all PG CDC source connectors? |
I think we can address the problem in the PG CDC connector internally (via the |
Fantastic! Will patiently await a fix upstream. ❤️ |
Describe the bug
A customer report that the replication slot on upstream PG gradually grow its size even though there are no activities to the source tables for a long time.
We can see that the difference between
current_wal_lsn
and therestart_lsn
is about 19GB.And the strange thing is that the emitted offset in the Debezium heartbeat message doesn't increase at all. We rely on the heartbeat message to acknowledge the source offset to the upstream PG to reclaim the WAL (#16058).
p.s 6781533580088 => 62A/F2E66B38
The doc of Debezium has some guidelines related to WAL disk consumption. The following one is the most likely one:
But the customer has confirmed that there is only one database in their postgres instance, all tables are located in the same database, so there doesn't exist a high-traffic database.
Fortunately, when there are activities occurred to the source tables, the WAL can be reclaimed since we acknowledge newer offset to the PG.
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: