Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL: Add --fifo-streams support to xtrabackup-push [PoC] #1650

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ostinru
Copy link
Contributor

@ostinru ostinru commented Mar 1, 2024

MySQL: Add --fifo-streams support to xtrabackup-push [PoC]

History and rationale

MySQL backup tools has same limitation - they all produce single stream backups. This approach has limitation - we are bounded by single pipe, single compressor and single encryptor performances. When I benchmarked wal-g in 2021, I have seen that slowest part is encryption. So, we introduced SplitStream backups #1131 that splits single backup stream into multiple and as a result allows us to use multiple CPU cores for encryption&compression. We speeded up backups from 70Mbp to 270Mps. At this point we reached limits of pipe (stdin) performance.

In 2023, Percona introduced[1] new feature xtrabackup --fifo-streams. Basically, they pushed stream separation close to MySQL. In theory it should work even faster than our previous approach.

From Percona's anounce:

The STDOUT method takes 01:25:24 to push 1TB of data using 239 MBps (1.8 Gbps).
The FIFO method, using 8 streams, takes 00:16:01 to push 1TB of data using 1.15 GBps (9.2 Gbps).

[1] https://docs.percona.com/percona-xtrabackup/innovation-release/xbcloud-binary-fifo-datasink.html

What this diff about?

Here I am parking my efforts to pass --fifo-streams to xtrabackup, and store data in backward-compatible way.

TODO:

  • benchmarks, benchmarks!
  • cleanup
  • check interference with incremental backups & split stream

@ostinru ostinru added the mysql MySQL issue label Mar 1, 2024
* pass `--fifo-streams` flag during backup-fetch
* fix paths
@ostinru
Copy link
Contributor Author

ostinru commented Mar 30, 2024

Benchmarks

Stand:
CPU: 16 hyper-threading cores Intel Xeon Processor (Icelake)
RAM: 64Gb
Disk: 1Tb non-replicated drive (with theoretical bandwidth 1Gbps read)
Data: 362 Gb of data on disk (not counting binlogs)

SplitStream in 16 parts

WALG_STREAM_SPLITTER_PARTITIONS: 16
WALG_STREAM_SPLITTER_BLOCK_SIZE: 1048576
time wal-g-mysql --config /etc/wal-g/wal-g.yaml backup-push --turbo
real    22m26.408s
user    256m19.039s
sys     7m55.108s
Untitled

Restore

time /tmp/wal-g backup-fetch --config /etc/wal-g/wal-g.yaml --turbo stream_20240330T172542Z

real    16m57.928s
user    76m30.811s
sys     18m44.768s
image

FIFO-streams (x16)

< no SplitStream enabled >

time /tmp/wal-g --config /etc/wal-g/wal-g-fifo.yaml xtrabackup-push --fifo-streams=16 --turbo
real    23m30.876s
user    223m45.787s
sys     7m9.855s
Untitled (1)

In this tests FIFO-streams works faster (1)... however it seems that it is not good at load balancing (2):

type size        last modified                     name
obj  22422546327 2024-03-30 17:14:25.309 +0000 UTC thread_0/stream.br  <--
obj  11211079247 2024-03-30 17:14:25.507 +0000 UTC thread_1/stream.br
obj  11211017569 2024-03-30 17:14:25.483 +0000 UTC thread_2/stream.br
obj  11211017443 2024-03-30 17:14:25.494 +0000 UTC thread_3/stream.br
obj  11211507376 2024-03-30 17:14:25.491 +0000 UTC thread_4/stream.br
obj  11211055249 2024-03-30 17:14:25.514 +0000 UTC thread_5/stream.br
obj      1611955 2024-03-30 17:14:25.246 +0000 UTC thread_6/stream.br
obj  22422383389 2024-03-30 17:14:25.337 +0000 UTC thread_7/stream.br <--
obj         6445 2024-03-30 17:14:25.169 +0000 UTC thread_8/stream.br <-- ??
obj  11211030187 2024-03-30 17:14:25.478 +0000 UTC thread_9/stream.br
obj       536806 2024-03-30 17:14:25.188 +0000 UTC thread_10/stream.br
obj  11211019045 2024-03-30 17:14:25.477 +0000 UTC thread_11/stream.br
obj  11211011785 2024-03-30 17:14:25.504 +0000 UTC thread_12/stream.br
obj  22422423840 2024-03-30 17:14:25.330 +0000 UTC thread_13/stream.br <--
obj  11211857592 2024-03-30 17:14:25.536 +0000 UTC thread_14/stream.br
obj  11211013628 2024-03-30 17:14:25.463 +0000 UTC thread_15/stream.br

Restore

time /tmp/wal-g backup-fetch --config /etc/wal-g/wal-g.yaml --turbo stream_20240330T165057Z

real    14m11.283s
user    68m52.421s
sys     20m56.684s
image

Next steps:

test 4 FIFO streams -> x4 SplitStream. I would expect something like 18 minutes for this cluster.

@ostinru
Copy link
Contributor Author

ostinru commented Mar 31, 2024

FIFO streams (4) + SplitStream (4)

WALG_STREAM_SPLITTER_PARTITIONS: 4
WALG_STREAM_SPLITTER_BLOCK_SIZE: 1048576
time /tmp/wal-g --config /etc/wal-g/wal-g-fifo.yaml xtrabackup-push --fifo-streams=4 --turbo
real    21m32.630s
user    208m32.490s
sys     6m46.799s
image

Still uneven distribution: 41Gb, 62Gb, 31Gb, 31Gb

/tmp/wal-g --config /etc/wal-g/wal-g.yaml  st ls basebackups_005/stream_20240331T091703Z/thread_0
type size        last modified                     name
obj  11210971154 2024-03-31 09:38:34.237 +0000 UTC part_0000.br
obj  11214707222 2024-03-31 09:38:34.143 +0000 UTC part_0001.br
obj  11213004393 2024-03-31 09:38:34.059 +0000 UTC part_0002.br
obj  11213324514 2024-03-31 09:38:34.104 +0000 UTC part_0003.br
obj  91          2024-03-31 09:38:34.285 +0000 UTC stream_metadata.json

/etc/wal-g # /tmp/wal-g --config /etc/wal-g/wal-g.yaml  st ls type size        last modified                     name
obj  16818159792 2024-03-31 09:38:34.155 +0000 UTC part_0000.br
obj  16805603051 2024-03-31 09:38:33.886 +0000 UTC part_0001.br
obj  16817676337 2024-03-31 09:38:34.166 +0000 UTC part_0002.br
obj  16835311017 2024-03-31 09:38:34.056 +0000 UTC part_0003.br
obj  91          2024-03-31 09:38:34.226 +0000 UTC stream_metadata.json
/tmp/wal-g --config /etc/wal-g/wal-g.yaml  st ls basebackups_005/stream_20240331T091703Z/thread_2
type size       last modified                     name
obj  8408471401 2024-03-31 09:38:34.289 +0000 UTC part_0000.br
obj  8410872192 2024-03-31 09:38:33.837 +0000 UTC part_0001.br
obj  8410018940 2024-03-31 09:38:34.282 +0000 UTC part_0002.br
obj  8410925448 2024-03-31 09:38:33.847 +0000 UTC part_0003.br
obj  91         2024-03-31 09:38:34.394 +0000 UTC stream_metadata.json
/tmp/wal-g --config /etc/wal-g/wal-g.yaml  st ls basebackups_005/stream_20240331T091703Z/thread_3
type size       last modified                     name
obj  8408596475 2024-03-31 09:38:34.144 +0000 UTC part_0000.br
obj  8410349817 2024-03-31 09:38:33.745 +0000 UTC part_0001.br
obj  8409959895 2024-03-31 09:38:33.776 +0000 UTC part_0002.br
obj  8410996273 2024-03-31 09:38:33.799 +0000 UTC part_0003.br
obj  91         2024-03-31 09:38:34.186 +0000 UTC stream_metadata.json

Restore

time /tmp/wal-g backup-fetch --config /etc/wal-g/wal-g.yaml --turbo stream_20240331T091703Z
ERROR: 2024/03/31 12:48:01.691634 MergeWriter error on sink close: close /tmp/wal-g2209983218/thread_3: file already closed
ERROR: 2024/03/31 12:48:02.477056 MergeWriter error on sink close: close /tmp/wal-g2209983218/thread_2: file already closed
ERROR: 2024/03/31 12:49:41.802327 MergeWriter error on sink close: close /tmp/wal-g2209983218/thread_0: file already closed
ERROR: 2024/03/31 12:53:03.046190 MergeWriter error on sink close: close /tmp/wal-g2209983218/thread_1: file already closed
INFO: 2024/03/31 12:53:03.052706 Restored stream_20240331T091703Z

real    12m39.420s (failed with error!)
user    74m24.371s
sys     18m4.079s
image image

@ostinru
Copy link
Contributor Author

ostinru commented Mar 31, 2024

FIFO Streams (2) + SplitStream (x8)

WALG_STREAM_SPLITTER_PARTITIONS: 8
WALG_STREAM_SPLITTER_BLOCK_SIZE: 1048576
time /tmp/wal-g --config /etc/wal-g/wal-g-fifo.yaml xtrabackup-push --fifo-streams=2 --turbo
real    20m7.015s
user    222m56.375s
sys     6m55.950s
image

Still observing imbalance: 73Gb and 93Gb

/tmp/wal-g --config /etc/wal-g/wal-g.yaml  st ls basebackups_005/stream_20240331T115348Z/thread_0
type size       last modified                     name
obj  9811333891 2024-03-31 12:13:52.374 +0000 UTC part_0000.br
obj  9815841093 2024-03-31 12:13:52.053 +0000 UTC part_0001.br
obj  9813269288 2024-03-31 12:13:52.4 +0000 UTC   part_0002.br
obj  9806735307 2024-03-31 12:13:52.252 +0000 UTC part_0003.br
obj  9806387619 2024-03-31 12:13:52.483 +0000 UTC part_0004.br
obj  9815281384 2024-03-31 12:13:52.002 +0000 UTC part_0005.br
obj  9814059100 2024-03-31 12:13:52.481 +0000 UTC part_0006.br
obj  9809999851 2024-03-31 12:13:52.391 +0000 UTC part_0007.br
obj  91         2024-03-31 12:13:52.542 +0000 UTC stream_metadata.json
[QA]root@rc1d-vz77xi4a22al2gx9 /etc/wal-g # /tmp/wal-g --config /etc/wal-g/wal-g.yaml  st ls basebackups_005/stream_20240331T115348Z/thread_1
type size        last modified                     name
obj  12616992971 2024-03-31 12:13:52.283 +0000 UTC part_0000.br
obj  12613033022 2024-03-31 12:13:52.224 +0000 UTC part_0001.br
obj  12608291167 2024-03-31 12:13:52.133 +0000 UTC part_0002.br
obj  12607930410 2024-03-31 12:13:52.15 +0000 UTC  part_0003.br
obj  12615290123 2024-03-31 12:13:52.44 +0000 UTC  part_0004.br
obj  12615411921 2024-03-31 12:13:52.269 +0000 UTC part_0005.br
obj  12617655239 2024-03-31 12:13:52.302 +0000 UTC part_0006.br
obj  12621001749 2024-03-31 12:13:52.39 +0000 UTC  part_0007.br
obj  91          2024-03-31 12:13:52.481 +0000 UTC stream_metadata.json

Restore:

time /tmp/wal-g backup-fetch --config /etc/wal-g/wal-g.yaml --turbo stream_20240331T115348Z
real    16m38.257s
user    70m26.904s
sys     18m26.238s
image

@ostinru
Copy link
Contributor Author

ostinru commented Mar 31, 2024

approach backup backup time restore time
SplitMerge x16 stream_20240330T172542Z 22m26.408s 16m57.928s
FIFO x16 stream_20240330T165057Z 23m30.876s 14m11.283s
FIFO x4 + SplitMerge x4 stream_20240331T091703Z 21m32.630s 12min then error ❌
FIFO x2 + SplitMerge x8 stream_20240331T115348Z 20m7.015s 16m38.257s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mysql MySQL issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant