Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generation creation failed due to S3 upload multipart failed #447

Open
kakarukeys opened this issue Dec 11, 2022 · 6 comments
Open

generation creation failed due to S3 upload multipart failed #447

kakarukeys opened this issue Dec 11, 2022 · 6 comments

Comments

@kakarukeys
Copy link

kakarukeys commented Dec 11, 2022

when starting litestream I saw this message in the log:

litestream v0.3.8
initialized db: /data/db.sqlite3
replicating to: name="s3" type="s3" bucket="xxx" path="lb-pipeline-prod/db.sqlite3" region="fra1" endpoint="https://fra1.digitaloceanspaces.com" sync-interval=1s
/data/db.sqlite3: init: cannot determine last wal position, clearing generation; primary wal header: EOF
/data/db.sqlite3: sync: new generation "c51b0ab65d5a9c1f", no generation exists


/data/db.sqlite3(s3): monitor error: MultipartUpload: upload multipart failed
        upload id: 2~AVDf9oLvoUjwWcYb5So7CZmoZnpUguF
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit

litestream snapshots / litestream generations does not reveal anything under new generation. Apparently the new generation creation has failed.

Is there any config I could set to tune the multipart upload?

my config is:

access-key-id: xxx
secret-access-key: xxx

dbs:
  - path: /data/db.sqlite3
    replicas:
      - url: s3://xxx.fra1.digitaloceanspaces.com/lb-pipeline-prod/db.sqlite3
        retention: 1h
        retention-check-interval: 20m
@kakarukeys
Copy link
Author

@benbjohnson
Copy link
Owner

@kakarukeys There's not currently a config option for this. @anacrolix created a PR a while back but the change should be a configuration option. I'm open to a PR for it if you want to add the config fields.

@anacrolix
Copy link
Contributor

@kakarukeys #284

@kakarukeys
Copy link
Author

i'd love to to. let me see if I can follow the code, and the previous PR. My golang skill got very rusty.

fyi another note, the above failure (OP) does not crash the container, does not raise any alarm. This together with the advice here to set pragma wal_autocheckpoint to 0 cause the WAL file to grow huge on my production server.

@hifi
Copy link
Collaborator

hifi commented Dec 20, 2022

@kakarukeys We have a downstream patch that prevents the WAL growing in some cases: beeper@cb44be6

Does that work for you? I've only seen it in some rare error conditions and indeed got WALs that were gigabytes in size. We haven't upstreamed it yet as we're running on patched 0.3.9 which conflicts with the current git head.

@kakarukeys
Copy link
Author

It might work, but I won't bet on that, because....
I am operating sqlite at crazy scale -> 350GB+ file, with several heavy writers and frequent readers.
Even after` turning off litestream and re-enabling the default checkpointing, I see 200GB wal file sometimes.

I read somewhere, if there is not a single moment where the db is not locked for R/W, there is no chance for sqlite to do a checkpointing.
I place my hope on the coming wal2 changes from sqlite (I think it might break litestream).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants