Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect binary data format in Postgres 14.x #430

Open
antogon opened this issue Jul 12, 2023 · 0 comments
Open

Incorrect binary data format in Postgres 14.x #430

antogon opened this issue Jul 12, 2023 · 0 comments

Comments

@antogon
Copy link

antogon commented Jul 12, 2023

Hi! I am writing in the hopes of getting some advice on how I can better debug the behaviour I'm encountering. We're using pglogical to replicate a production database into a separate instance we can use for reporting/analytics (all in GCP). Around once a day, sometimes far more frequently, we get the following logs in our replica instance:

2023-07-12 14:11:47.223 UTC [3036301]: [1-1] db=analytics_prod,user=[unknown] LOG:  starting apply for subscription prod_subscription
2023-07-12 14:11:47.317 UTC [3036301]: [2-1] db=analytics_prod,user=[unknown] ERROR:  incorrect binary data format
2023-07-12 14:11:47.317 UTC [3036301]: [3-1] db=analytics_prod,user=[unknown] CONTEXT:  apply UPDATE in commit before 3472/6727B900, xid 418939469 committed at 2023-07-12 13:59:30.061784+00 (action #2) from node replorigin 1
2023-07-12 14:11:47.320 UTC [3036301]: [4-1] db=analytics_prod,user=[unknown] LOG:  apply worker [3036301] at slot 1 generation 3752 exiting with error
2023-07-12 14:11:47.323 UTC [7]: [16482-1] db=,user= LOG:  background worker \"pglogical apply 25026:3244215688\" (PID 3036301) exited with exit code 1

When this happens, replication goes down. I am able to work around it by reviewing/removing the topmost information in the slot on the publisher/production database side of things:

prod=> SELECT * FROM pg_logical_slot_peek_changes('pgl_analytics_prod_publisher_prod_subc15ecd8', NULL, 100, 'include-xids', '0', 'min_proto_version', '1', 'max_proto_version', '1',
          'startup_params_format', '1', 'proto_format', 'json');
      lsn      |    xid    |                                                                                                                                                                                                                                                                                                                                                      data
---------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 3472/6709C950 | 418939469 | {"action":"S", "params": {"max_proto_version":"1","min_proto_version":"1","coltypes":"f","pg_version_num":"140005","pg_version":"14.5","pg_catversion":"202107181","database_encoding":"UTF8","encoding":"SQL_ASCII","forward_changeset_origins":"t","walsender_pid":"424243","pglogical_version":"2.4.2","pglogical_version_num":"20402","binary.internal_basetypes":"f","binary.binary_basetypes":"f","binary.basetypes_major_version":"1400","binary.sizeof_int":"4","binary.sizeof_long":"8","binary.sizeof_datum":"8","binary.maxalign":"8","binary.bigendian":"f","binary.float4_byval":"f","binary.float8_byval":"t","binary.integer_datetimes":"f","binary.binary_pg_version":"1400","no_txinfo":"f"}}
 3472/6709C950 | 418939469 | {"action":"B", "has_catalog_changes":"f", "xid":"418939469", "first_lsn":"3472/6709C950", "commit_time":"2023-07-12 13:59:30.061784+00"}
 3472/6727B930 | 418939469 | {"action":"C", "final_lsn":"3472/6727B900", "end_lsn":"3472/6727B930"}
 3472/6727B930 | 418939503 | {"action":"B", "has_catalog_changes":"f", "xid":"418939503", "first_lsn":"3472/6727B930", "commit_time":"2023-07-12 13:59:30.068726+00"}
 3472/6727CD78 | 418939503 | {"action":"C", "final_lsn":"3472/6727CD48", "end_lsn":"3472/6727CD78"}
 3472/67281018 | 418939507 | {"action":"B", "has_catalog_changes":"f", "xid":"418939507", "first_lsn":"3472/67281018", "commit_time":"2023-07-12 13:59:30.286045+00"}
 3472/672824B0 | 418939507 | {"action":"C", "final_lsn":"3472/67282480", "end_lsn":"3472/672824B0"}
 3472/672824B0 | 418939508 | {"action":"B", "has_catalog_changes":"f", "xid":"418939508", "first_lsn":"3472/672824B0", "commit_time":"2023-07-12 13:59:30.298939+00"}
prod=> SELECT * FROM pg_logical_slot_get_changes('pgl_analytics_prod_publisher_prod_subc15ecd8', '3472/6727B930', NULL, 'include-xids', '0', 'min_proto_version', '1', 'max_proto_version', '1',
          'startup_params_format', '1', 'proto_format', 'json');
      lsn      |    xid    |                                                                                                                                                                                                                                                                                                                                                      data
---------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 3472/6709C950 | 418939469 | {"action":"S", "params": {"max_proto_version":"1","min_proto_version":"1","coltypes":"f","pg_version_num":"140005","pg_version":"14.5","pg_catversion":"202107181","database_encoding":"UTF8","encoding":"SQL_ASCII","forward_changeset_origins":"t","walsender_pid":"424243","pglogical_version":"2.4.2","pglogical_version_num":"20402","binary.internal_basetypes":"f","binary.binary_basetypes":"f","binary.basetypes_major_version":"1400","binary.sizeof_int":"4","binary.sizeof_long":"8","binary.sizeof_datum":"8","binary.maxalign":"8","binary.bigendian":"f","binary.float4_byval":"f","binary.float8_byval":"t","binary.integer_datetimes":"f","binary.binary_pg_version":"1400","no_txinfo":"f"}}
 3472/6709C950 | 418939469 | {"action":"B", "has_catalog_changes":"f", "xid":"418939469", "first_lsn":"3472/6709C950", "commit_time":"2023-07-12 13:59:30.061784+00"}
 3472/6727B930 | 418939469 | {"action":"C", "final_lsn":"3472/6727B900", "end_lsn":"3472/6727B930"}
(3 rows)

Then, I can re-enable the subscription on the replica side and it seems to be "fixed" although I'm sure some data was lost by my manual changes to the replication slot.

I've attempted to reproduce this on my local laptop a few times by restoring a production dump and setting up replication there, but my attempts are still unpredictable as to whether they will trigger the error in question.

My current working theory is that our organization's relatively recent usage of uuid as a primary key in our tables (we're slowly moving towards this and away from bigint PKs) could be causing this -- but this theory is only based on intuition, not proof just yet.

Is it possible that using uuid columns in our tables could cause this? Is there some form of supplementary logging I can turn up to determine the exact cause of this error? Is there some other/better means of debugging you would suggest?

Looking for any advice I can get that might help me get closer to solving this. Thanks in advance! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant