New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Azure Blob Storage API #378
base: master
Are you sure you want to change the base?
Conversation
This also has a side effect of fixing a bug where any backups past the first 14 were inaccessible due to the list page size. The new default |
This branch appears to leak file descriptors during a This is the list of fds with a pool size of 10 after about 20 segments: https://gist.github.com/titanous/3808052e8c7e0cefa4009adebe4d6a4c The number of open file descriptors grew with each new segment that started downloading. |
Does a specific test show those leaks? |
It happens consistently when downloading a large base backup. You could make a large dataset with pgtune to test. |
I am looking at making further changes to make sure this works reliably. |
Using a package built with 5ae3deb I was able to backup-push a PostgreSQL database server with 20GB of databases to Azure blob storage, and restore that backup-fetch that basebackup to a new server. I advocate for this pull request to be reviewed for acceptance. Any feedback is appreciated. |
@titanous Is there anything else required for this? |
Hey! I assume you have tested this in production so the change is good for Azure and works in production? Really like how the never libraries allows for some code deletion. Two things come to mind right now:
|
Changes * Update WAL-E minimum Azure SDK version 3.0.0 to leverage blob listing paging issues and use SDK-provided get / put segmentation * Update extras to explicitly specify azure 3.0.0 * Update azure-storage-blob minimum version to 1.1.0 * migrate uri_put_file from put_blob / put_block / put_block_list to create_blob_from_bytes which has built in segmenting and parallelization * migrate uri_get_file from get_blob with chunking to get_blob_to_stream which has built in segmenting $ tox -e py35 137 passed, 115 skipped in 22.41 seconds References: https://github.com/Azure/azure-storage-python/blob/master/BreakingChanges.md#version-0300 https://azure-sdk-for-python.readthedocs.io/en/v0.11.1/ref/azure.storage.blobservice.html https://docs.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python#azure_storage_blob_blockblobservice_BlockBlobService_create_blob_from_bytes https://docs.microsoft.com/en-us/python/api/azure.storage.blob.baseblobservice.baseblobservice?view=azure-python#azure_storage_blob_baseblobservice_BaseBlobService_list_blobs https://github.com/Azure/azure-storage-python/blob/master/samples/blob/block_blob_usage.py
this required moving monkey patches to if __main__ to allow pytests to run
4ea9731
to
5e0a415
Compare
|
Verified that integration tests for GCS work even after the threads are patched. |
@fdr Merging this probably means we need a version bump. Any thoughts on this? We should probably release a new version anyway now that the python 3.6 errors are fixed. |
I have used @nkiraly branch, with python 3.6 on Centos7. It is working without any issues. (yet!). I still need to test the restore part. Can you please merge this and create a new version. |
Let's go ahead and release.
…On Wed, Jul 11, 2018 at 4:09 AM vijayrajah ***@***.***> wrote:
I have used @nkiraly <https://github.com/nkiraly> with python 3.6 on
Centos7. It is working without any issues. (yet!). I still need to test the
restore part.
Can you please merge this and create a new version.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcF-hdmzwK4Fv3YMJgEHcWMQIdgB_7ks5uFdzbgaJpZM4R9dj_>
.
|
There have been several breaking changes and improvments to azure-storage APIs across the Azure SDK releases from 1.0.3 to 2.0.0 to 3.0.0.
Notable items affecting WAL-E:
Changes
Update WAL-E minimum Azure SDK version 3.0.0 to leverage blob listing paging issues and use SDK-provided get / put segmentation
Update extras to explicitly specify azure 3.0.0
Update azure-storage-blob minimum version to 1.1.0
migrate uri_put_file from put_blob / put_block / put_block_list to create_blob_from_bytes which has built in segmenting and parallelization
migrate uri_get_file from get_blob with chunking to get_blob_to_stream which has built in segmenting
$ tox -e py35
137 passed, 115 skipped in 22.41 seconds
References:
https://github.com/Azure/azure-storage-python/blob/master/BreakingChanges.md#version-0300
https://azure-sdk-for-python.readthedocs.io/en/v0.11.1/ref/azure.storage.blobservice.html
https://docs.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python#azure_storage_blob_blockblobservice_BlockBlobService_create_blob_from_bytes
https://docs.microsoft.com/en-us/python/api/azure.storage.blob.baseblobservice.baseblobservice?view=azure-python#azure_storage_blob_baseblobservice_BaseBlobService_list_blobs
https://github.com/Azure/azure-storage-python/blob/master/samples/blob/block_blob_usage.py