mydumper partition dumping chunks should maybe order on partition file size #1302

kwenieroalt · 2023-09-15T22:19:21Z

kwenieroalt
Sep 15, 2023

Hey there,

First of all thanks for this brilliant piece of software. And I am trying to make backups of pretty big databases (20Tb and 40TB).
Some of the tables are partitioned ( some 40+ partition files) and vary in size depending on the number of records per day.
So with the first backup test on 20Tb using 4 threads the backup did take quite some time (4 days), which I expected, but then running it a second time using 8 threads instead, it took even longer. And I expected it to be faster.

From my understanding when checking the network use between the DB and the backup server, I think that a chunk with 8 threads is started and then only if all threads are finished then a new set of 8 threads is started. This is okay usually. But in my case if I have one partition file which is for example 100Gb and 7 others are 50Gb or less.. then 7 threads are done quickly, but the last thread is taking much longer while 7 threads are doing nothing. (Hope that makes sense).

So two solutions for that:

Make the threading so that when a thread is done, it picks up a new partition to dump.
Sort the partition_name list based on the size of the partition. That way big tables are dumped at the same time and the waiting time before all threads are done is at its smallest.

About 1. I can imagine that with the row chunking that doesn't make much sense as row based chunks are mostly dumped within the same time frame.

About 2. In mydumper_chunks.c in the get_partitions_for_table I see a select query used to fetch the partition_names , maybe we can add some sorting based on the number of entries in there and the average row size. I will have a look at the database to see if any viable info is stored in the information schema, which can be used as a hint. Usually the information there is not super accurate.

Hope this makes sense, maybe an option for the command to sort the partitions based on their sizes is an option.

kwenieroalt · 2023-09-16T14:40:57Z

kwenieroalt
Sep 16, 2023
Author

Just did some testing on MySQL 5.7 server and you can sort on the DATA_LENGTH column.
So a query like this should be the best option:
select PARTITION_NAME from information_schema.PARTITIONS where PARTITION_NAME is not null and TABLE_SCHEMA='' and TABLE_NAME='

' ORDER BY DATA_LENGTH DESC

I will see if I can build the mydumper binary myself with that change.. to see if it helps at all.

0 replies

kwenieroalt · 2023-12-05T13:46:27Z

kwenieroalt
Dec 5, 2023
Author

Actually I was able to build a custom mydumper with this 'ORDER BY DATALENGTH DESC', but it did not really make a huge difference. The main problem is that the threading starts for example 8 threads and waits until the last one is done. Then it starts the next 8 threads.

Tbh... as long as there are partitions available to dump, new dumps should be started on all threads instead of waiting foor al threads to finish. Would this be possible to implement?

So probably in the code, a different thread handling is required to make better use of (large) partitioned tables.
I am not a good coder and will need to take some time to look into it, so any suggestions would be appreciated.

5 replies

davidducos Dec 5, 2023
Maintainer

Can you test with the latest prerelease? I think that this has been already fixed

kwenieroalt Dec 5, 2023
Author

Do you mean that I checkout the master branch and build that one to test?
I think I can do that, yes.

kwenieroalt Dec 8, 2023
Author

Well I tried tag [v0.15.2-6] and master.. they both segfault on my Debian 11 machine. While stracing it looks like that it crashes when cloning childs.

The v0.15.2-5 did do something but required me to install jemalloc2 before it would compile and run... (version v0.15.2-6 did not require that). It also threw me this:
** (mydumper-master:599712): CRITICAL **: 11:37:49.308: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08.dbname/submittedrange_stats.00000.sql (17) ** (mydumper-master:599712): CRITICAL **: 11:37:49.316: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08.dbname/receivedrange_stats.00000.sql (17) ** (mydumper-master:599712): CRITICAL **: 11:37:49.323: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08.dbname/message_return.00000.sql (17) ** (mydumper-master:599712): CRITICAL **: 11:37:49.328: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08.dbname/message_tag.00000.sql (17) ** (mydumper-master:599712): CRITICAL **: 11:37:49.400: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08.dbname/message_tag.00001.sql (17) ** (mydumper-master:599712): CRITICAL **: 11:37:49.400: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08.dbname/message_tag.00002.sql (17) ** (mydumper-master:599712): CRITICAL **: 11:37:49.437: cannot create named pipe /mnt/backup//db1-mysql/2023-12-08/message_tag.00003.sql (17)
I ran this all as root user (so permission denied is not an issue)

The v0.15.2-4 did also start, but I assigned 12 threads and it did not use them all at the start.

davidducos Jan 12, 2024
Maintainer

Are you taking backup with the same version that you are trying to restore? we don't support backup/restore with mixed versions.

Solmea Jan 12, 2024

well to be honest I did not get to restores yet with the latest versions. Currenbtly I only do backups with the Debian 11 default version. So restoring it on a Same versioned Debian should be okay. I did test that as this setup uses gtid, which I never restored before with myloader. but it worked and replication was good afterwards.

But for testing I wanted to try the improved version which should speed up partition backups. And that segmentation faults with the 0.15.2-6

kwenieroalt · 2024-04-04T12:41:33Z

kwenieroalt
Apr 4, 2024
Author

Just in case anybody searches for this and wonders if the issue is now fixed. I have tested with the -0.16.1-3 version on Debian Bullseye and it now works and the threading is really better then before with partitioned tables. The backup time went from 2 hours previously to 1 hour (on the same backup hardware and also the same DB hardware!).

So kudos to @davidducos for the changes in the 0.16.1-3 release.. they work and improve backup speed too. Will now start to test a restore, but was too exited with this result to not mention it here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mydumper partition dumping chunks should maybe order on partition file size #1302

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

mydumper partition dumping chunks should maybe order on partition file size #1302

kwenieroalt Sep 15, 2023

Replies: 3 comments · 5 replies

kwenieroalt Sep 16, 2023 Author

kwenieroalt Dec 5, 2023 Author

davidducos Dec 5, 2023 Maintainer

kwenieroalt Dec 5, 2023 Author

kwenieroalt Dec 8, 2023 Author

davidducos Jan 12, 2024 Maintainer

Solmea Jan 12, 2024

kwenieroalt Apr 4, 2024 Author

kwenieroalt
Sep 15, 2023

Replies: 3 comments 5 replies

kwenieroalt
Sep 16, 2023
Author

kwenieroalt
Dec 5, 2023
Author

davidducos Dec 5, 2023
Maintainer

kwenieroalt Dec 5, 2023
Author

kwenieroalt Dec 8, 2023
Author

davidducos Jan 12, 2024
Maintainer

kwenieroalt
Apr 4, 2024
Author