Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wildduck's storage architecture for efficiency and scalability #291

Open
1 task done
figassis opened this issue Nov 22, 2020 · 5 comments
Open
1 task done
Labels
new feature New feature.

Comments

@figassis
Copy link

Use case

What problem you are trying to solve?
Maildir is less space efficient and less scalable than a clustered database as a mail store.

Note alternatives you considered and why they are not useful.
I've tried using Maildir over an S3 backend, but performance can be an issue.

Your idea for a solution

Compress messages, deduplicate attachments and store in a clustered database like MongoDB.

How your solution would work in general?
Wildduck stores messages and attachments in MongoDB. It compresses data and deduplicates attachments, greatly reducing storage requirements and allowing us to easily scale our deployments. I currently use it in production and works great.

  • I'm willing to help with the implementation
@figassis figassis added the new feature New feature. label Nov 22, 2020
@foxcpp
Copy link
Owner

foxcpp commented Nov 22, 2020

My current idea of distributed/scalable deployment is putting go-imap-sql on top of CockroachDB with message blobs stored in some block storage (e.g. S3).
This all is tracked in #279.

Attachment deduplication may be worth exploring though.

@figassis
Copy link
Author

I agree. Probably WD gets most gains from attachment deduplication rather than the specific storage backend. Deduplication can easily be done by storing attachment hashes, and may even bring a performance improvement as you would often not need to send a file to storage. Deleting messages with attachments would only delete the file and hash if it's the last message pointing to it.

I'm not very familiar with the codebase, but I do have go experience, so I can help as soon as I find some bandwidth.

@theduke
Copy link
Contributor

theduke commented Nov 27, 2020

I second the S3 backend.

That also enables S3 compatible storage and can easily be self-hosted with minio.

@Avamander
Copy link
Contributor

Avamander commented Nov 28, 2020

I do not want the maintenance burden of a separate server/machine/etc., neither wildduck, maildir, S3 or cockroachDB.

I would appreciate the ability to store my mail in the same database as the metadata (e.g. PostgreSQL). Maybe not the same table as the metadata, but still. This would make consistent backups trivial and advanced search, filtering and analysis much easier. Same applies to attachments, would make things like for example deduplication trivial.

@foxcpp
Copy link
Owner

foxcpp commented Nov 28, 2020

Early versions of imapsql backend stored message contents as a blob in the same table as metadata. That turned out to be a performance problem. Now message contents are stored into abstracted "external storage", with the only currently available implementation being fs directory. It is definitely possible to add an implementation that just stores blobs in table rows. This should not cause performance problems if the table is separate from metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature.
Projects
None yet
Development

No branches or pull requests

4 participants