Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 0.19.4-beta.6 performance issues #4681

Closed
5 tasks done
dessalines opened this issue Apr 30, 2024 · 13 comments
Closed
5 tasks done

[Bug]: 0.19.4-beta.6 performance issues #4681

dessalines opened this issue Apr 30, 2024 · 13 comments
Labels
bug Something isn't working
Milestone

Comments

@dessalines
Copy link
Member

dessalines commented Apr 30, 2024

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Do you agree to follow the rules in our Code of Conduct?
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

We recently upgraded lemmy.ml to 0.19.4-beta.6, as well as upgraded to postgres 16, and some things are going much slower. I'm not sure of the culprit, but here are a few slow ones.

comment insert:

duration: 35229.395 ms  execute s86834: INSERT INTO "comment" ("creator_id", "post_id", "content", "removed", "published", "updated", "deleted", "ap_id", "local", "distinguished", "language_id") VALUES ($1, $2, $3, DEFAULT, $4, DEFAULT, $5, $6, $7, DEFAULT, DEFAULT) ON CONFLICT ("ap_id") WHERE (coalesce("comment"."updated", "comment"."published") < $8) DO UPDATE SET "creator_id" = $9, "post_id" = $10, "content" = $11, "published" = $12, "deleted" = $13, "ap_id" = $14, "local" = $15 RETURNING "comment"."id", "comment"."creator_id", "comment"."post_id", "comment"."content", "comment"."removed", "comment"."published", "comment"."updated", "comment"."deleted", "comment"."ap_id", "comment"."local", "comment"."path", "comment"."distinguished", "comment"."language_id"

person insert:

duration: 41885.124 ms  execute s84589: INSERT INTO "person" ("name", "public_key", "instance_id", "display_name", "avatar", "banned", "published", "updated", "actor_id", "bio", "local", "private_key", "last_refreshed_at", "banner", "deleted", "inbox_url", "shared_inbox_url", "matrix_user_id", "bot_account", "ban_expires") VALUES ($1, $2, $3, DEFAULT, DEFAULT, DEFAULT, $4, DEFAULT, $5, DEFAULT, $6, DEFAULT, $7, DEFAULT, $8, $9, $10, DEFAULT, $11, DEFAULT) ON CONFLICT ("actor_id") DO UPDATE SET "name" = $12, "public_key" = $13, "instance_id" = $14, "published" = $15, "actor_id" = $16, "local" = $17, "last_refreshed_at" = $18, "deleted" = $19, "inbox_url" = $20, "shared_inbox_url" = $21, "bot_account" = $22 RETURNING "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."bot_account", "person"."ban_expires", "person"."instance_id"

The standard post query is also slower.

duration: 3515.528 ms  execute s51460:  postgres-1 
-- PostQuery::list
postgres-1  |   SELECT * FROM (SELECT "post"."id", "post"."name,....

Also getting a lot of these sharelocks:

Process 87 waits for ShareLock on transaction 340080; blocked by process 53.

I'm also seeing the lemmy-federate process having a very high CPU usage, >400%

Steps to Reproduce

See above

Technical Details

Same

Version

0.19.4-beta.6

Lemmy Instance URL

main

@dessalines dessalines added the bug Something isn't working label Apr 30, 2024
@dessalines dessalines changed the title [Bug]: 0.19.4-beta.6 DB performance issues [Bug]: 0.19.4-beta.6 performance issues Apr 30, 2024
@dessalines
Copy link
Member Author

K here's an explain analyze. The "Trigger Name": "update_statement" seems to be the culprit.

cc @dullbananas @phiresky

QUERY PLAN                        
----------------------------------------------------------
 [                                                       +
   {                                                     +
     "Plan": {                                           +
       "Node Type": "ModifyTable",                       +
       "Operation": "Insert",                            +
       "Parallel Aware": false,                          +
       "Async Capable": false,                           +
       "Relation Name": "comment",                       +
       "Alias": "comment",                               +
       "Startup Cost": 0.00,                             +
       "Total Cost": 0.01,                               +
       "Plan Rows": 0,                                   +
       "Plan Width": 0,                                  +
       "Actual Startup Time": 1.711,                     +
       "Actual Total Time": 1.711,                       +
       "Actual Rows": 0,                                 +
       "Actual Loops": 1,                                +
       "Conflict Resolution": "UPDATE",                  +
       "Conflict Arbiter Indexes": ["idx_comment_ap_id"],+
       "Tuples Inserted": 0,                             +
       "Conflicting Tuples": 1,                          +
       "Shared Hit Blocks": 71,                          +
       "Shared Read Blocks": 63,                         +
       "Shared Dirtied Blocks": 15,                      +
       "Shared Written Blocks": 0,                       +
       "Local Hit Blocks": 0,                            +
       "Local Read Blocks": 0,                           +
       "Local Dirtied Blocks": 0,                        +
       "Local Written Blocks": 0,                        +
       "Temp Read Blocks": 0,                            +
       "Temp Written Blocks": 0,                         +
       "Plans": [                                        +
         {                                               +
           "Node Type": "Result",                        +
           "Parent Relationship": "Outer",               +
           "Parallel Aware": false,                      +
           "Async Capable": false,                       +
           "Startup Cost": 0.00,                         +
           "Total Cost": 0.01,                           +
           "Plan Rows": 1,                               +
           "Plan Width": 616,                            +
           "Actual Startup Time": 0.152,                 +
           "Actual Total Time": 0.153,                   +
           "Actual Rows": 1,                             +
           "Actual Loops": 1,                            +
           "Shared Hit Blocks": 8,                       +
           "Shared Read Blocks": 5,                      +
           "Shared Dirtied Blocks": 1,                   +
           "Shared Written Blocks": 0,                   +
           "Local Hit Blocks": 0,                        +
           "Local Read Blocks": 0,                       +
           "Local Dirtied Blocks": 0,                    +
           "Local Written Blocks": 0,                    +
           "Temp Read Blocks": 0,                        +
           "Temp Written Blocks": 0                      +
         }                                               +
       ]                                                 +
     },                                                  +
     "Planning": {                                       +
       "Shared Hit Blocks": 126,                         +
       "Shared Read Blocks": 27,                         +
       "Shared Dirtied Blocks": 0,                       +
       "Shared Written Blocks": 0,                       +
       "Local Hit Blocks": 0,                            +
       "Local Read Blocks": 0,                           +
       "Local Dirtied Blocks": 0,                        +
       "Local Written Blocks": 0,                        +
       "Temp Read Blocks": 0,                            +
       "Temp Written Blocks": 0                          +
     },                                                  +
     "Planning Time": 3.052,                             +
     "Triggers": [                                       +
       {                                                 +
         "Trigger Name": "aggregates",                   +
         "Relation": "comment",                          +
         "Time": 0.410,                                  +
         "Calls": 1                                      +
       },                                                +
       {                                                 +
         "Trigger Name": "insert_statement",             +
         "Relation": "comment",                          +
         "Time": 3.627,                                  +
         "Calls": 1                                      +
       },                                                +
       {                                                 +
         "Trigger Name": "update_statement",             +
         "Relation": "comment",                          +
         "Time": 15363.898,                              +
         "Calls": 1                                      +
       }                                                 +
     ],                                                  +
     "Execution Time": 15369.733                         +
   }                                                     +
 ]
(1 row)

@phiresky
Copy link
Collaborator

can you post the query you ran?

@dullbananas
Copy link
Collaborator

dullbananas commented Apr 30, 2024

Here's the slow trigger

CALL r.create_triggers ('comment', $$

@dessalines
Copy link
Member Author

dessalines commented Apr 30, 2024

The query:

explain (FORMAT JSON, analyze, buffers) INSERT INTO comment (creator_id, post_id, content, removed, published, updated, deleted, ap_id, local, distinguished, language_id) VALUES (701945, '15075325', 'does anyone have a brief summary of why it caused concern? ', DEFAULT, '2024-04-30 14:55:55.455344+00', DEFAULT, 'f', 'https://lemmy.world/comment/9760415', 'f', 'f', '37') ON CONFLICT (ap_id) WHERE (coalesce(comment.updated, comment.published) < '2024-04-30 14:55:55.455344+00') DO UPDATE SET creator_id = 701945, post_id = 15075325, content = 'does anyone have a brief summary of why it caused concern? ', published = '2024-04-30 14:55:55.455344+00', deleted = 'f', ap_id = 'https://lemmy.world/comment/9760415', local = 'f', distinguished = 'f', language_id = '37';

I'm downgrading lemmy.ml for now tho, we need to fix this locally and spend some more time on this.

@dullbananas lemmy.ml is back on 0.19.3, so no rush on checking this one out. lmk if you need me to test with lemmy.ml's prod database on my dev machine.

@dullbananas
Copy link
Collaborator

@dessalines Could you check the execution time of each statement executed by the trigger? It might be possible by using auto_explain with log_triggers and log_analyze enabled.

@Nutomic
Copy link
Member

Nutomic commented May 1, 2024

Have you tried to reproduce this with the db_perf crate?

@dessalines
Copy link
Member Author

@dullbananas https://paste.centos.org/view/e746389e

There's a few comment inserts that take >30s, with trigger logging enabled, but it doesn't seem to show what's specifically slow.

Also noteworthy that this affects not just comment inserts, but every insert, person, post, etc.

@dullbananas
Copy link
Collaborator

dullbananas commented May 4, 2024

       {                                                 +
         "Trigger Name": "insert_statement",             +
         "Relation": "comment",                          +
         "Time": 3.627,                                  +
         "Calls": 1                                      +
       },                                                +
       {                                                 +
         "Trigger Name": "update_statement",             +
         "Relation": "comment",                          +
         "Time": 15363.898,                              +
         "Calls": 1                                      +
       }                                                 +

Looks like this is a problem specifically for update or upsert. insert_statement uses the same function definition as update_statement, except select_old_rows is replaced with (SELECT * FROM select_new_rows WHERE FALSE).

Edit: this is probably wrong because the insert_statement trigger would run with 0 rows as input in this case

@dullbananas
Copy link
Collaborator

The duration of the transaction done when inserting a comment is probably a big contributing factor. ShareLock is used to wait for another transaction to finish.

@dullbananas
Copy link
Collaborator

I think this is the main problem: in the new triggers, the update to site_aggregates is not the last statement, but it probably was previously the last trigger to run because "site" is far down in alphabetical order. This would extend the duration from the start of the site_aggregates update to the end of the whole transaction, which is the period in which concurrent writers are blocked. Avoiding this problem is far more crucial for site_aggregates than for other tables because all local insertions update the same row in site_aggregates.

@dessalines dessalines added this to the 0.19.4 milestone May 7, 2024
@dessalines
Copy link
Member Author

So just moving that statement to the bottom of the trigger would fix it?

We already have some site_aggregates and community_aggregates updates in scheduled tasks run every hour, so to me a lot of these non-crucial "site stat" type ones it makes sense to move out of triggers.

@dullbananas
Copy link
Collaborator

So just moving that statement to the bottom of the trigger would fix it?

Yes, and that's one of the things done in #4696

@dessalines
Copy link
Member Author

Closing this, as #4696 is merged, and it fixed the issues 🤞 . Can re-open if more pop up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants