Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGINT / Ctrl-C handling #8155

Closed
halfgaar opened this issue Mar 17, 2024 · 19 comments
Closed

SIGINT / Ctrl-C handling #8155

halfgaar opened this issue Mar 17, 2024 · 19 comments
Assignees
Milestone

Comments

@halfgaar
Copy link

I've had this with Borg 1.2.6 and now with 1.2.7 again. Interrupting it (on the initial, non-finished backup) results in a crash and even a crash in a crash. I did ctrl-c once, not multiple times.

Ubuntu 20.04 with Borg 1.2.6:

Enter passphrase for key ssh://halfgaar@redacted.com:/redacted/halfgaar/borg: 
^CLocal Exception7 MB C 129.69 MB D 6519 N home/halfgaar/.config/chromium/Profile 1/IndexedDB/redacted/1/00/2                                                                                                                                     
Traceback (most recent call last):
  File "borg/archiver.py", line 5324, in main
  File "borg/archiver.py", line 5255, in run
  File "borg/archiver.py", line 183, in wrapper
  File "borg/archiver.py", line 664, in do_create
  File "borg/archiver.py", line 597, in create_inner
  File "borg/archiver.py", line 812, in _rec_walk
  File "borg/archiver.py", line 812, in _rec_walk
  File "borg/archiver.py", line 812, in _rec_walk
  [Previous line repeated 5 more times]
  File "borg/archiver.py", line 775, in _rec_walk
  File "borg/archiver.py", line 677, in _process_any
  File "borg/archive.py", line 1455, in process_file
  File "borg/archive.py", line 1277, in process_file_chunks
  File "borg/archive.py", line 1265, in chunk_processor
  File "borg/cache.py", line 952, in add_chunk
  File "borg/remote.py", line 476, in do_rpc
  File "borg/remote.py", line 711, in call
  File "borg/remote.py", line 821, in call_many
  File "borg/helpers/process.py", line 212, in handler
  File "borg/helpers/process.py", line 222, in __exit__
  File "contextlib.py", line 126, in __exit__
  File "borg/helpers/process.py", line 171, in signal_handler
  File "signal.py", line 56, in signal
  File "borg/helpers/process.py", line 212, in handler
  File "borg/helpers/process.py", line 222, in __exit__
  File "contextlib.py", line 126, in __exit__
ValueError: generator already executing

Platform: Linux melk 5.4.0-166-generic #183-Ubuntu SMP Mon Oct 2 11:28:33 UTC 2023 x86_64
Linux: Unknown Linux  
Borg: 1.2.6  Python: CPython 3.9.18 msgpack: 1.0.5 fuse: llfuse 1.5.0 [pyfuse3,llfuse]
PID: 19623  CWD: /home/halfgaar/bin
sys.argv: ['borg', 'create', '--stats', '--progress', '--exclude-caches', '--exclude-if-present', 'NOTBACKEDUP.TXT', '--keep-exclude-tags', 'halfgaar@redacted.com:/redacted/halfgaar/borg::redacted-{now}', '/home/halfgaar/', '/mnt/redacted']
SSH_ORIGINAL_COMMAND: None

This is Ubuntu 22.04 with Borg 1.2.7:

Creating...
Warning: Attempting to access a previously unknown unencrypted repository!
Do you want to continue? [yN] y
^CLocal Exception C 7.62 GB D 26970 N home/wiebe/Videos/redacted.mkv
Traceback (most recent call last):
  File "signal.py", line 39, in _enum_to_int
TypeError: int() argument must be a string, a bytes-like object or a number, not 'function'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "borg/archiver.py", line 5343, in main
  File "borg/archiver.py", line 5263, in run
  File "borg/archiver.py", line 189, in wrapper
  File "borg/archiver.py", line 672, in do_create
  File "borg/archiver.py", line 605, in create_inner
  File "borg/archiver.py", line 820, in _rec_walk
  File "borg/archiver.py", line 820, in _rec_walk
  File "borg/archiver.py", line 820, in _rec_walk
  File "borg/archiver.py", line 783, in _rec_walk
  File "borg/archiver.py", line 685, in _process_any
  File "borg/archive.py", line 1455, in process_file
  File "borg/archive.py", line 1277, in process_file_chunks
  File "borg/archive.py", line 1265, in chunk_processor
  File "borg/cache.py", line 952, in add_chunk
  File "borg/remote.py", line 476, in do_rpc
  File "borg/remote.py", line 711, in call
  File "borg/remote.py", line 821, in call_many
  File "borg/helpers/process.py", line 212, in handler
  File "borg/helpers/process.py", line 222, in __exit__
  File "contextlib.py", line 126, in __exit__
  File "borg/helpers/process.py", line 171, in signal_handler
  File "signal.py", line 56, in signal
  File "signal.py", line 39, in _enum_to_int
  File "borg/helpers/process.py", line 212, in handler
  File "borg/helpers/process.py", line 222, in __exit__
  File "contextlib.py", line 126, in __exit__
ValueError: generator already executing

Platform: Linux WiebeFlaptop 5.15.0-100-generic #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024 x86_64
Linux: Unknown Linux  
Borg: 1.2.7  Python: CPython 3.9.18 msgpack: 1.0.7 fuse: llfuse 1.5.0 [pyfuse3,llfuse]
PID: 17713  CWD: /home/wiebe/bin
sys.argv: ['borg', 'create', '--stats', '--progress', '--exclude-caches', '--exclude-if-present', 'NOTBACKEDUP.TXT', '--keep-exclude-tags', 'halfgaar@redacted.com:/redacted/borg-laptop::laptop-userfiles-{now}', '/home/wiebe/']
SSH_ORIGINAL_COMMAND: None

And another variant on Ubuntu 22.04 with Borg 1.2.7:

/home/wiebe/.borg-crash.txt.swp: stat: [Errno 2] No such file or directory: '.borg-crash.txt.swp'df                                                                                                                                         
^CLocal ExceptionB C 3.01 GB D 151572 N /redactedfile
Traceback (most recent call last):
  File "borg/remote.py", line 721, in send_buffer
  File "borg/remote.py", line 417, in write
BlockingIOError: [Errno 11] Resource temporarily unavailable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "borg/archiver.py", line 5343, in main
  File "borg/archiver.py", line 5263, in run
  File "borg/archiver.py", line 189, in wrapper
  File "borg/archiver.py", line 672, in do_create
  File "borg/archiver.py", line 605, in create_inner
  File "borg/archiver.py", line 820, in _rec_walk
  File "borg/archiver.py", line 820, in _rec_walk
  File "borg/archiver.py", line 820, in _rec_walk
  [Previous line repeated 7 more times]
  File "borg/archiver.py", line 783, in _rec_walk
  File "borg/archiver.py", line 685, in _process_any
  File "borg/archive.py", line 1455, in process_file
  File "borg/archive.py", line 1277, in process_file_chunks
  File "borg/archive.py", line 1266, in chunk_processor
  File "borg/remote.py", line 984, in async_response
  File "borg/remote.py", line 781, in call_many
  File "borg/remote.py", line 724, in send_buffer
  File "borg/helpers/process.py", line 212, in handler
  File "borg/helpers/process.py", line 222, in __exit__
  File "contextlib.py", line 126, in __exit__
  File "borg/helpers/process.py", line 171, in signal_handler
  File "signal.py", line 56, in signal
  File "borg/helpers/process.py", line 212, in handler
  File "borg/helpers/process.py", line 222, in __exit__
  File "contextlib.py", line 126, in __exit__
ValueError: generator already executing

Systems are plain ext4, backing up over SSH to encrypted and unencrypted repo.

@infectormp
Copy link
Contributor

What result do you expect by using CTRL-C while the process is running?

@halfgaar
Copy link
Author

Ctrl-C is a request to finish nicely. In the Borg 1.2 release notes:

create: First ctrl-c (SIGINT) triggers checkpoint creation and then aborts.

This commit message adds support for it. Also to carefully wrap up the current deletion, if that's what it's doing.

This normally happens well, but there are occasions, mostly seen with new archives, that crash like the above.

For reference, this is the normal interrupt output of a small test (replaced 'clear line' char with 'newline', so it's readable):

0 B O 0 B C 0 B D 0 N src                        
Initializing cache transaction: Reading config
Initializing cache transaction: Reading chunks
Initializing cache transaction: Reading files

715.15 kB O 715.15 kB C 0 B D 0 N src/big        
17.74 MB O 17.74 MB C 0 B D 0 N src/big          
32.51 MB O 32.51 MB C 0 B D 0 N src/big          
46.96 MB O 46.96 MB C 0 B D 0 N src/big          
59.21 MB O 59.21 MB C 0 B D 0 N src/big          
76.95 MB O 76.95 MB C 0 B D 0 N src/big          
97.65 MB O 97.65 MB C 0 B D 0 N src/big          
123.05 MB O 123.05 MB C 0 B D 0 N src/big        
137.59 MB O 137.59 MB C 0 B D 0 N src/big        
147.16 MB O 147.16 MB C 0 B D 0 N src/big        
168.24 MB O 168.24 MB C 0 B D 0 N src/big        
194.63 MB O 194.63 MB C 0 B D 0 N src/big        
Saving files cache
Saving chunks cache
Saving cache config

Initializing cache transaction: Reading config
Initializing cache transaction: Reading chunks
Initializing cache transaction: Reading files

195.97 MB O 195.97 MB C 0 B D 0 N src/big

(ctrl-c)
        
Saving files cache
Saving chunks cache
Saving cache config

Initializing cache transaction: Reading config
Initializing cache transaction: Reading chunks
Initializing cache transaction: Reading files

209.72 MB O 209.72 MB C 0 B D 1 N src/big
                                                 
Got Ctrl-C / SIGINT.

This process can take a few dozen seconds sometimes.

@ThomasWaldmann ThomasWaldmann added this to the 1.2.8 milestone Mar 18, 2024
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

That TypeError in Python stdlib signal:39 (_enum_to_int) seems to be a cosmetic thing, considering there is an exception handler dealing with that TypeError:

https://github.com/python/cpython/blob/3.9/Lib/signal.py#L39

@ThomasWaldmann
Copy link
Member

Wondering why we see handler twice in the tracebacks - that looks like SIGINT was fired twice and the 2nd firing was while the first one was still being processed (was still inside __exit__).

@halfgaar
Copy link
Author

I can absolutely guarantee that's not the case. Here's my history:

I normally ctrl-c like a maniac. That's because often I abort shell scripts or loops with subprocesses. So, the first time I used Borg I did that too. This triggered a problem like this, and then I started investigating how Borg deals with that. Then I found it was documented behavior that one ctrl-c aborts gracefully, and two is harder. I've been using that method regularly.

But, it seems that under certain conditions, and it looks like it has to do with it being a new archive or no checkpoint being present yet, this crash happens?

@ThomasWaldmann
Copy link
Member

I tried to reproduce with borg 1.2.7 (built from git tag) on macOS. I used a user@localhost:/tmp/repo12 repo, so it used the remote.py code and ssh, but it always behaved as it should (about 10 attempts), no tracebacks.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

@halfgaar Sure, I believe you. Maybe there is a bug somewhere. The generator it complains about is in the remote.py code part, so maybe that only happens with "remote" (ssh) repos.

I am trying to reproduce now on a ubuntu system, let's see if that behaves differently.

I doubt that it has to do with the presence of checkpoints. borg always starts a fresh archive when using borg create. It does not really "continue from where it was", it only feels like that due to the deduplication and the chunks already present in the repo.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

OK, tried on Ubuntu Mantic (23.10) using Python 3.11.6 and borg 1.2.7 (from git tag) with a ssh: repo (with and without prior stuff in it): it also behaved as it should, not tracebacks.

I sshed into the ubuntu machine, so my ctrl-c keypress was transmitted via ssh to the borg client (not on a local text console or a gui terminal).

I see you always used python 3.9.18 on 2 different ubuntu versions - can you retry with some more recent python? Or just use Ubuntu's system python3?

Also, what's the borg version on the repo side? In my case it was both the same 1.2.7 version (although the traceback does not look like the remote borg version should have an influence on that).

@ThomasWaldmann
Copy link
Member

Another try:

On Ubuntu Mantic (23.10) using Python 3.11.6 and borg 1.2.7 - this time installed using pip install borgbackup into a fresh venv. Also with a ssh: repo (with and without prior stuff in it): it also behaved as it should, not tracebacks.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

@halfgaar Do you use a borg fat binary downloaded from github releases page? If so, please give me the link to the binary you use and the sha256sum of the binary on your system.

@ThomasWaldmann
Copy link
Member

https://github.com/borgbackup/borg/releases/download/1.2.7/borg-linux64 I tried to reproduce using that file and interestingly I sometimes got "Keyboard interrupt" (looks like the default python handler) and sometimes "Got Ctrl-C" (borg's checkpoint-saving handler), never a traceback though.

@ThomasWaldmann
Copy link
Member

OK, so I guess this happens when one uses a pyinstaller binary (which creates a bootloader process and a borg process, both being in same process group):

  • Ctrl-C sends SIGINT to both processes
  • the bootloader process "forwards" the SIGINT (again) to the borg process

So the borg process ends up getting SIGINT twice.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

About adding --bootloader-ignore-signals to the pyinstaller commandline:

  • guess that would solve the "duplicate SIGINT" for the borg process
  • but not sure what bad effects that might have, like for other signals?

@halfgaar
Copy link
Author

I the fat binary, sha256 hash d53081df3926ecb631610a99218766505b13ccf9cd5cbeae10da5ada826f5925. Version 1.2.7.

I completely forgot about the fact that my target server is Debian 12, with Python 3.11.2. My source machines are stock Ubuntu, with Python Python 3.8.16 and Python 3.10.12.

Does the 3.9.18 come from the fat binary?

Here's another different result:

$ borg create --stats --progress 'halfgaar@localhost:/tmp/borgtest/myrepo::bla-{now}' src/
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:xTw9vxM1KjAAJ88sOHuoDHKMmJuSddNB9enepjI5mLI.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Remote: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Warning: The repository at location ssh://halfgaar@localhost/tmp/borgtest/myrepo was previously located at /tmp/borgtest/myrepo
Do you want to continue? [yN] y
^CRemote: Borg 1.2.7: Got unexpected RPC data format from client.                                                                                                                                                                                                              
Connection closed by remote host

I've been trying to reproduce again, but now I can't. I don't know what I'm doing differently.

But I see you may have a culprit.

@ThomasWaldmann ThomasWaldmann changed the title Error 'generator already executing' on interrupt SIGINT / Ctrl-C handling Mar 28, 2024
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

Yes, the py39 comes bundled within the fat binary (that's how I noticed you likely use that). In my most recent experiment, I used the same binary as you use on your machine.

Not sure yet how a good solution for this issue would look like. It could be avoided by not using that binary, because the cause is the binary's "bootloader" process.

@ThomasWaldmann
Copy link
Member

Guess a solution would involve not calling self.__exit__ from the handler, because by doing that, we would revert the SIGINT handler to python's default one and thus lose control over it.

If then a duplicate SIGINT comes in a microsecond later, our handler won't be called any more and it'll terminate the process like if we pressed Ctrl-C more than once.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Mar 28, 2024

See #8162 - I tested it with a fresh pyi-made binary and it "debounced" successfully!

@halfgaar
Copy link
Author

Sounds great. I'll be busy/gone for a while the coming days so won't be able to test more, but it looks like it's all done practically.

ThomasWaldmann added a commit that referenced this issue Mar 29, 2024
fix Ctrl-C / SIGINT behaviour for pyinstaller-made binaries, fixes #8155 (1.2-maint)
@ThomasWaldmann ThomasWaldmann modified the milestones: 1.2.8, 1.4.0b2 Mar 29, 2024
@ThomasWaldmann ThomasWaldmann modified the milestones: 1.4.0b2, 2.0.0rc1 Mar 30, 2024
ThomasWaldmann added a commit that referenced this issue Mar 30, 2024
fix Ctrl-C / SIGINT behaviour for pyinstaller-made binaries, fixes #8155
ThomasWaldmann added a commit that referenced this issue Apr 1, 2024
fix Ctrl-C / SIGINT behaviour for pyinstaller-made binaries, fixes #8155 (master)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants