Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploads are unintentionally closed by the __del__ destructor #685

Open
Limess opened this issue Jan 17, 2022 · 1 comment
Open

Uploads are unintentionally closed by the __del__ destructor #685

Limess opened this issue Jan 17, 2022 · 1 comment

Comments

@Limess
Copy link

Limess commented Jan 17, 2022

Problem description

  • Scenario: Writing to S3 using multipart uploads
  • Expected: Uploads are not committed to S3 when the process exits
  • Actual behaviour: Uploads are committed when the process exits

Steps/code to reproduce the problem

It seems that the __del__ finalizer is being called on both:

  • io.BufferedIOBase which MultipartWriter extends from, calling close
  • GzipFile when the process exits, which in turn calls close MultipartWriter due to the close method being monkeypatched
    when these objects are garbage collected, e.g. when the system exits

This does not seem to be intended behaviour when using multipart uploads, it's not desirable to complete multipart uploads in the middle of the upload when the process exits elsewhere, e.g. on EC2 when we get a spot instance termination, and has led to us seeing incomplete data, and corrupted gzip uploads, being written to S3 when other issues cause the system to exit.

For other uploads, this behaviour is OK as it means the files are closed as expected- writes aren't intended to be atomic.

Ideally close should never be called on MultipartWriter, unless a context manager exits without an exception, or if close is explicitly called by the user code.

Versions

>>> import platform, sys, smart_open
m.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)>>> print(platform.platform())
Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid
>>> print("Python", sys.version)
Python 3.7.11 (default, Jan 13 2022, 14:48:06)
[GCC 9.3.0]
>>> print("smart_open", smart_open.__version__)
smart_open 5.2.1
@Limess
Copy link
Author

Limess commented Jan 17, 2022

We were trying to get around this by monkeypatching the file object to remove the __del__ method from both the gzip and smart open s3 file objects in our user code, but didn't have any success without monkeypatching the whole classmethod on gzip.GzipFile which is undesirable as it seems there's special behaviour with python dundermethods, there may be a way around this though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant