Heartbeat table updating. Positions table not. Logs show successfully processing 0 records. #2010

eihli · 2023-06-01T17:56:17Z

We've encountered this at least twice. Both times after a database restart. We're on RDS. The first time was a resize. The second time was a change to a parameter group (or something like that).

We're running in the zendesk/maxwell:v1.37.6 docker image.

docker logs --follow shows this and it updates in realtime:

17:22:52,538 INFO  LogInputStreamReader - [2023-06-01 17:22:52.538664] [0x00000043][0x00007f2efdc38700] [info] [processing_statistics_logger.cc:114] Stage 2 Triggers: { stream: 'users-db', manual: 0, count: 0, size: 0, matches: 0, timed: 0, KinesisRecords: 0, PutRecords: 0 }
17:22:52,538 INFO  LogInputStreamReader - [2023-06-01 17:22:52.538672] [0x00000043][0x00007f2efdc38700] [info] [processing_statistics_logger.cc:129] (users-db) Average Processing Time: -nan ms
17:23:07,538 INFO  LogInputStreamReader - [2023-06-01 17:23:07.538751] [0x00000043][0x00007f2efdc38700] [info] [processing_statistics_logger.cc:111] Stage 1 Triggers: { stream: 'users-db', manual: 0, count: 0, size: 0, matches: 0, timed: 0, UserRecords: 0, KinesisRecords: 0 }
17:23:07,538 INFO  LogInputStreamReader - [2023-06-01 17:23:07.538791] [0x00000043][0x00007f2efdc38700] [info] [processing_statistics_logger.cc:114] Stage 2 Triggers: { stream: 'users-db', manual: 0, count: 0, size: 0, matches: 0, timed: 0, KinesisRecords: 0, PutRecords: 0 }
17:23:07,538 INFO  LogInputStreamReader - [2023-06-01 17:23:07.538799] [0x00000043][0x00007f2efdc38700] [info] [processing_statistics_logger.cc:129] (users-db) Average Processing Time: -nan ms

Running SELECT * FROM maxwell.heartbeats; a few times shows the heartbeat updating.

Running SELECT * FROM maxwell.positions; repeatedly never shows any update. It's stuck at a last_heartbeat_read and a binlog_file from hours ago.

docker restart fixes the issue.

Any ideas? Threads to pull on? More info I can provide?

Thanks for any help.

The text was updated successfully, but these errors were encountered:

osheroff · 2023-06-10T17:27:57Z

is data flowing during this time? Ie is the whole system stuck or just heartbeats?

liuxiaomingg · 2023-06-12T10:09:51Z

encounter the same problem The data flows normally

eihli · 2023-06-13T01:47:11Z

is data flowing during this time? Ie is the whole system stuck or just heartbeats?

In my case, the logs always show 0 UserRecords and 0 KinesisRecords even though rows are being updated in the database.

{ stream: 'users-db', manual: 0, count: 0, size: 0, matches: 0, timed: 0, UserRecords: 0, KinesisRecords: 0 }

Also, the Maxwell's position table doesn't update. The row in that table that tracks the latest binary log file and its latest position never updates.

But the heartbeats table does update.

osheroff · 2023-06-13T01:58:28Z

definitely sounds like maxwell's binlog replication thread is stuck or dead; the fact that the heartbeats table is updating doesn't necessarily mean it's getting any rows.

Can you capture complete logs from the system? The kinesis logs you're showing don't necessarily mean that much, but maybe there's some errors or warnings or info before that?

Curious when you say "after a database restart" -- does maxwell not die when the database first restarts? or is this when maxwell restarts after the db restarts?

If all else fails you might want to try running with --binlog_heartbeat, which is good at detecting connections that go dead / unresponsive for miscellaneous reaasons.

liuxiaomingg · 2023-06-13T03:46:15Z

Nothing wrong with the console interface ,When restart maxwell ,The service becomes normal

liuxiaomingg · 2023-06-13T03:47:20Z

There's no obvious error log on the console, either

eihli · 2023-06-13T04:54:59Z

Can you capture complete logs from the system? The kinesis logs you're showing don't necessarily mean that much, but maybe there's some errors or warnings or info before that?

I don't have the full logs but I did read them and nothing jumped out.

Curious when you say "after a database restart" -- does maxwell not die when the database first restarts? or is this when maxwell restarts after the db restarts?

We have it set to unlimited reconnection retries. I think I saw it retry a few thousand times in a minute (those numbers are from memory, don't count on them being accurate) or so and then when the database came back up it reconnected and started logging as if everything was A-OK.

If all else fails you might want to try running with --binlog_heartbeat, which is good at detecting connections that go dead / unresponsive for miscellaneous reaasons.

I'll give this a shot. Thanks!

One other note: this isn't 100% reproducible for us. Sometimes after Maxwell can't connect for a ~minute it will reconnect and continue processing data normally. Rough estimate is that we've seen it break ~3 times out of ~20 database restarts. Perhaps of note is that usually when we do a database restart it's because we're resizing an RDS instance. I didn't expect that to be an issue because we're not upgrading MySQL, changing IP address, or doing anything else that I'd expect to affect the connection/binlog.

If nothing jumps out as an "ah-hah" from this description, I can try to narrow in on making it reproducible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heartbeat table updating. Positions table not. Logs show successfully processing 0 records. #2010

Heartbeat table updating. Positions table not. Logs show successfully processing 0 records. #2010

eihli commented Jun 1, 2023 •

edited

osheroff commented Jun 10, 2023

liuxiaomingg commented Jun 12, 2023

eihli commented Jun 13, 2023

osheroff commented Jun 13, 2023

liuxiaomingg commented Jun 13, 2023

liuxiaomingg commented Jun 13, 2023

eihli commented Jun 13, 2023 •

edited

Heartbeat table updating. Positions table not. Logs show successfully processing 0 records. #2010

Heartbeat table updating. Positions table not. Logs show successfully processing 0 records. #2010

Comments

eihli commented Jun 1, 2023 • edited

osheroff commented Jun 10, 2023

liuxiaomingg commented Jun 12, 2023

eihli commented Jun 13, 2023

osheroff commented Jun 13, 2023

liuxiaomingg commented Jun 13, 2023

liuxiaomingg commented Jun 13, 2023

eihli commented Jun 13, 2023 • edited

eihli commented Jun 1, 2023 •

edited

eihli commented Jun 13, 2023 •

edited