Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres schema updates #3860

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mzealey
Copy link
Contributor

@mzealey mzealey commented Jul 8, 2022

Please see the individual commit messages. I have not attempted to apply this to the 'new' schema and there are a couple of TODOs where I think something should be NOT NULL but I'm not 100% sure.

Combined with some server-side postgres performance optimizations this schema is supporting a cluster containing millions of simultaneous active users on relatively small hardware.

- Conversion of UNIQUE KEY constraints to PRIMARY KEY - in pg a PRIMARY
  KEY is a UNIQUE KEY where all columns are non-null, so this patch
  makes it more explicit.
- Removal of useless indexes - a btree index of (A,B,C) implies fast
  indexing of (A,B) and (A) but directly specifying these indexes
  causes much IO overhead on record modification
- Allow more than 2b items in spool/pubsub by switching SERIAL to
  BIGSERIAL
- Add some TODOs where it looks like columns should be marked NOT NULL
  but are not actually. As I'm not particularly familiar with the
  ejabberd I don't know if this should change or not.
This reduces 3 IO's per update to 1 IO per update in a typical case.
@p1bot
Copy link
Collaborator

p1bot commented Jul 8, 2022

Hi @mzealey, many thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@p1bot p1bot added the cla-missing Contributor needs to sign Contribution License Agreement label Jul 8, 2022
@licaon-kter
Copy link
Contributor

licaon-kter commented Jul 8, 2022

@mzealey

some server-side postgres performance optimizations

Do tell more. 😉

LE: also, why remove so many indexes?

@p1bot
Copy link
Collaborator

p1bot commented Jul 8, 2022

You did it @mzealey!

Thank you for signing the ProcessOne Contribution License Agreement.

We will have a look at your contribution!

@p1bot p1bot removed the cla-missing Contributor needs to sign Contribution License Agreement label Jul 8, 2022
@mremond mremond self-assigned this Jul 8, 2022
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 33.579% when pulling 02eab53 on binuadmin:mz/pg-schema-updates into 99d9e31 on processone:master.

@mzealey
Copy link
Contributor Author

mzealey commented Jul 19, 2022

@mzealey

some server-side postgres performance optimizations

Do tell more. wink

The key ones are commit_delay=10000 and synchronous_commit=off which accept a little bit of potential data loss for a massive reduction in latency/iops.

LE: also, why remove so many indexes?

Sorry only saw this now - see commit message for 22d49ad about the rationale behind this; let me know if you need more detail?

@nosnilmot
Copy link
Contributor

The most obviously beneficial part of this PR is the elimination of redundant indexes. Those changes are applicable to ALL schemas - old and new, and for all DB types. Commit 06ffe99 in PR #3980 addressed those.

It could be argued either way which of UNIQUE INDEX and PRIMARY KEY constraints is more explicit/logical/understandable, but again any change should be made across all schemas for consistency. I'd argue that there are limited benefits and sticking with the status-quo is preferred here. Any change would also have knock-on effects on the schema upgrade scripts and ongoing maintenance for existing installations.

Changing columns from SERIAL to BIGSERIAL is another reasonable suggestion (and again needs to apply to both old and 'new' schemas). Commit 4f0e426 in PR #3980 addressed this.

I'm not convinced the fillfactor performance changes are useful to include in a general purpose schema - premature optimization can be less than helpful, and I feel these might be workload dependent.

@mzealey
Copy link
Contributor Author

mzealey commented Jan 21, 2023

Looks reasonable thanks. I don't think the fillfactor is particularly bad - Pg sets it (I believe to 90%) on btree indexes anyway, so setting it at 90% on a table which has high volume of UPDATEs will just allow much less disk IO (via HOT mechanism) at a slight cost of disk space. I don't see how it's ever going to be detrimental.

@Neustradamus
Copy link
Contributor

To follow

@mremond mremond added this to the ejabberd 23.xx milestone Jul 20, 2023
@badlop badlop modified the milestones: ejabberd 23.10, ejabberd 23.xx Oct 17, 2023
@mremond mremond assigned jsautret and unassigned mremond Oct 26, 2023
@Neustradamus
Copy link
Contributor

@mzealey: Can you look for pg.new.sql too?

There is a second PR too:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants