Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup-fetch has issues with specific parts of backups #1601

Open
caseyandgina opened this issue Nov 22, 2023 · 1 comment
Open

backup-fetch has issues with specific parts of backups #1601

caseyandgina opened this issue Nov 22, 2023 · 1 comment

Comments

@caseyandgina
Copy link

Database name

PostgreSQL

Issue description

Describe your problem

When using backup-fetch to restore a backup, I consistently encounter errors on a few of the part files. These errors are printed out right after apparent success messages for each of those files. For example:

INFO: 2023/11/21 22:58:28.639654 Finished extraction of part_001.tar.br 
ERROR: 2023/11/21 22:58:28.639680 Extraction error in part_001.tar.br: extractOne: Interpret failed: Interpret: copy failed: unexpected EOF

At the end, when using wal-g v2.0.1, it appears to indicate (with poor grammar) that the files failed to download, but are successful upon retry:

WARNING: 2023/11/21 23:41:17.957282 4 files failed to download: [part_001.tar.br part_1382.tar.br part_1576.tar.br part_1666.tar.br]. Going to sleep and retry downloading them.
INFO: 2023/11/21 23:42:23.623221 Finished extraction of part_1382.tar.br
INFO: 2023/11/21 23:42:26.988360 Finished extraction of part_1666.tar.br
INFO: 2023/11/21 23:42:28.495593 Finished extraction of part_1576.tar.br
INFO: 2023/11/21 23:42:31.879127 Finished extraction of part_001.tar.br

I'm not sure what might be special about these files, as they are not especially large, and I have no problem fetching them from S3 and untarring them manually. Our S3 is Ceph storage within the same location as the database nodes. Uncompressed sizes:

-rw-r----- 1 root root 1.6G Nov 19 08:00 part_001.tar
-rw-r----- 1 root root 1.1G Nov 19 11:05 part_1382.tar
-rw-r----- 1 root root 1.7G Nov 19 11:35 part_1576.tar
-rw-r----- 1 root root 1.1G Nov 19 11:45 part_1666.tar

Please provide steps to reproduce

This is repeatable for me on any one of our backups, but I don't have any idea what the cause might be.

Please add config and wal-g stdout/stderr logs for debug purpose

INFO: 2023/11/21 22:56:13.694128 Selecting the backup with name base_000000180000A52700000075...                                           
INFO: 2023/11/21 22:56:29.688481 Finished extraction of part_005.tar.br                                                                    
INFO: 2023/11/21 22:56:29.997573 Finished extraction of part_008.tar.br                                                                    
INFO: 2023/11/21 22:56:30.079675 Finished extraction of part_007.tar.br                                                                    
...
INFO: 2023/11/21 22:58:21.215664 Finished extraction of part_066.tar.br                                                                    
INFO: 2023/11/21 22:58:28.639654 Finished extraction of part_001.tar.br                                               
ERROR: 2023/11/21 22:58:28.639680 Extraction error in part_001.tar.br: extractOne: Interpret failed: Interpret: copy failed: unexpected EOF                                                                                                                                           
INFO: 2023/11/21 22:58:29.393794 Finished extraction of part_070.tar.br                                                                    
INFO: 2023/11/21 22:58:29.878953 Finished extraction of part_069.tar.br                                                                    
...
INFO: 2023/11/21 23:10:51.928280 Finished extraction of part_1405.tar.br        
INFO: 2023/11/21 23:10:53.967860 Finished extraction of part_1382.tar.br                                                                                                                                                                                                              
ERROR: 2023/11/21 23:10:53.967885 Extraction error in part_1382.tar.br: extractOne: Interpret failed: Interpret: copy failed: unexpected EOF                                                                                                                                          
INFO: 2023/11/21 23:10:54.228336 Finished extraction of part_1404.tar.br                                                                                                                                                                                                              
INFO: 2023/11/21 23:10:55.245058 Finished extraction of part_1406.tar.br                                                                                                                                                                                                              
...
INFO: 2023/11/21 23:19:07.008045 Finished extraction of part_1631.tar.br                                                                                                                                                                                                              
INFO: 2023/11/21 23:19:08.410853 Finished extraction of part_1576.tar.br                                                                                                                                                                                                              
ERROR: 2023/11/21 23:19:08.410886 Extraction error in part_1576.tar.br: extractOne: Interpret failed: Interpret: copy failed: unexpected EOF                                                                                                                                          
INFO: 2023/11/21 23:19:12.949301 Finished extraction of part_1635.tar.br                                                                   
INFO: 2023/11/21 23:19:13.695947 Finished extraction of part_1634.tar.br                                                                   
...
INFO: 2023/11/21 23:22:44.857690 Finished extraction of part_248.tar.br    
INFO: 2023/11/21 23:22:45.558064 Finished extraction of part_1666.tar.br      
ERROR: 2023/11/21 23:22:45.558086 Extraction error in part_1666.tar.br: extractOne: Interpret failed: Interpret: copy failed: unexpected EOF                                                                                                                                          
INFO: 2023/11/21 23:22:55.733022 Finished extraction of part_249.tar.br                                                                    
...
INFO: 2023/11/21 23:41:17.956567 Finished extraction of part_999.tar.br
WARNING: 2023/11/21 23:41:17.957282 4 files failed to download: [part_001.tar.br part_1382.tar.br part_1576.tar.br part_1666.tar.br]. Going to sleep and retry downloading them.
INFO: 2023/11/21 23:42:23.623221 Finished extraction of part_1382.tar.br
INFO: 2023/11/21 23:42:26.988360 Finished extraction of part_1666.tar.br
INFO: 2023/11/21 23:42:28.495593 Finished extraction of part_1576.tar.br
INFO: 2023/11/21 23:42:31.879127 Finished extraction of part_001.tar.br
INFO: 2023/11/21 23:42:31.906664 Finished extraction of pg_control.tar.br
INFO: 2023/11/21 23:42:31.906685                                                                                                           
Backup extraction complete.
@caseyandgina
Copy link
Author

It's interesting that two of the parts (001 and 1666) happen at very different places in the output than expected.

Also curious that the order of files seems to be mostly alphabetical, but zero padding is only used to 3 digits. I don't think it matters at all, but is a bit surprising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant