Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove need for couchdb db per user #9038

Open
garethbowen opened this issue Apr 23, 2024 · 4 comments
Open

Remove need for couchdb db per user #9038

garethbowen opened this issue Apr 23, 2024 · 4 comments
Labels
Type: Performance Make something faster

Comments

@garethbowen
Copy link
Member

garethbowen commented Apr 23, 2024

Describe the performance issue

CHT creates one db per user in pouch replicating to couch to store metadata like telemetry, feedback, and read status. This is problematic with national scale projects as this will create in the order of 100k databases. telemetry and feedback docs are replication and then periodically copied into the medic-users-meta db and then wiped. The read docs are replicated and kept as a record of which docs the user has seen across devices.

Describe the improvement you'd like

Remove the requirement for one db per user to reduce server load.

For telemetry and feedback this could be to change the way the docs are replicated so that instead of native replication this is a bespoke API that writes directly into the medic-users-meta db thus eliminating the need for the periodic cleanup.

For read docs this is more difficult but we should investigate storing these in a shared db. Alternatively we could just store them on the local device which means if you got a new device the data would be lost but I don't think anyone would even notice. We could also consider removing read docs altogether as they only apply to reports and messages which aren't widely displayed but this would need wider consultation.

Describe alternatives you've considered

It might be possible to do something clever with db partitions, especially for the read docs, where there is one db organised into per-user partitions.

@garethbowen garethbowen added the Type: Performance Make something faster label Apr 23, 2024
@garethbowen
Copy link
Member Author

When asked if this will be an issue, the official update from CouchDB is...

No concern generally. The main issue is how many of these are being accessed at any one time. Which is presumably way lower than the total. That is too to say to make max_dbs_open larger than your max.
You also want to keep things that multiply per-db and cost resources to a minimum. In particular unless a single DB is bigger than 1-10Gb, set q=1. And also keep only only design doc per db (modulo a second one for transparent index updates).

So this may not be the limit I worried it was.

We need to keep these settings in mind: https://docs.couchdb.org/en/stable/maintenance/performance.html#system-resource-limits

@garethbowen
Copy link
Member Author

garethbowen commented Apr 24, 2024

We probably need to update this:

max_dbs_open = 5000

And also document setting the ulimit on the host system.

@dianabarsan
Copy link
Member

I'm doubtful anyone realistically uses read docs. I suppose we could check training material to see if we ever indicate the red bubbles are important.

Alternatively we could just store them on the local device which means if you got a new device the data would be lost but I don't think anyone would even notice.

I think this is the best choice. I think when the user logs in for the first time, we should assume that all docs are read (there's no point in marking them as unread!), and only start counting new documents.

@michaelkohn
Copy link
Member

I'm doubtful anyone realistically uses read docs.

There are a couple of projects that I know of (CHV-NEO and a few I-TECH projects) that probably use the "Read" state of Messages on the Messages Page. Also, NSSD might be using them on the Reports Page (they have a role that is supposed to review submitted reports from the Reports Page and the "Read" state was the main way of knowing which ones they reviewed already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Make something faster
Projects
None yet
Development

No branches or pull requests

3 participants