mydumper partition dumping chunks should maybe order on partition file size #1302
Replies: 3 comments 5 replies
-
Just did some testing on MySQL 5.7 server and you can sort on the DATA_LENGTH column. I will see if I can build the mydumper binary myself with that change.. to see if it helps at all. |
Beta Was this translation helpful? Give feedback.
-
Actually I was able to build a custom mydumper with this 'ORDER BY DATALENGTH DESC', but it did not really make a huge difference. The main problem is that the threading starts for example 8 threads and waits until the last one is done. Then it starts the next 8 threads. Tbh... as long as there are partitions available to dump, new dumps should be started on all threads instead of waiting foor al threads to finish. Would this be possible to implement? So probably in the code, a different thread handling is required to make better use of (large) partitioned tables. |
Beta Was this translation helpful? Give feedback.
-
Just in case anybody searches for this and wonders if the issue is now fixed. I have tested with the -0.16.1-3 version on Debian Bullseye and it now works and the threading is really better then before with partitioned tables. The backup time went from 2 hours previously to 1 hour (on the same backup hardware and also the same DB hardware!). So kudos to @davidducos for the changes in the 0.16.1-3 release.. they work and improve backup speed too. Will now start to test a restore, but was too exited with this result to not mention it here. |
Beta Was this translation helpful? Give feedback.
-
Hey there,
First of all thanks for this brilliant piece of software. And I am trying to make backups of pretty big databases (20Tb and 40TB).
Some of the tables are partitioned ( some 40+ partition files) and vary in size depending on the number of records per day.
So with the first backup test on 20Tb using 4 threads the backup did take quite some time (4 days), which I expected, but then running it a second time using 8 threads instead, it took even longer. And I expected it to be faster.
From my understanding when checking the network use between the DB and the backup server, I think that a chunk with 8 threads is started and then only if all threads are finished then a new set of 8 threads is started. This is okay usually. But in my case if I have one partition file which is for example 100Gb and 7 others are 50Gb or less.. then 7 threads are done quickly, but the last thread is taking much longer while 7 threads are doing nothing. (Hope that makes sense).
So two solutions for that:
About 1. I can imagine that with the row chunking that doesn't make much sense as row based chunks are mostly dumped within the same time frame.
About 2. In mydumper_chunks.c in the get_partitions_for_table I see a select query used to fetch the partition_names , maybe we can add some sorting based on the number of entries in there and the average row size. I will have a look at the database to see if any viable info is stored in the information schema, which can be used as a hint. Usually the information there is not super accurate.
Hope this makes sense, maybe an option for the command to sort the partitions based on their sizes is an option.
Beta Was this translation helpful? Give feedback.
All reactions