Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplementation of previous repair, but without significant performance hit #130

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HEM42
Copy link

@HEM42 HEM42 commented May 7, 2024

Summary

Reimplementation of previous repair, but without significant performance hit on large amount of jobs

Motivation

Restore repair functionality removed with the release of v10.27, but where this time it works on large scale installations, see referenced issue for numbers

References

#129

kraih
kraih previously requested changes May 7, 2024
SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished'
)
UNION
SELECT id FROM minion_jobs WHERE state = 'inactive' AND expires <= NOW()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're deleting expired jobs twice now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an oversight, fixed in recent commit (including failed perltidy check)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure combining the two queries is more efficient here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging alone two queries would seem much better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure combining the two queries is more efficient here?

I would think its always more efficient to combine it into a single query. But I would have to have a chat with my dba.

For debugging alone two queries would seem much better.

I agree here it would be better for debugging

The original cleanup of expired jobs, is fast by it self

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also concerned about the time locks are being kept on the table. Even if the large transaction is theoretically a tiny bit more efficient, multiple smaller deletes would be preferable if they allow for faster updates from other transactions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is reflected in the push from earlier today

@mergify mergify bot dismissed kraih’s stale review May 7, 2024 20:11

Pull request has been modified.

@kraih
Copy link
Member

kraih commented May 7, 2024

Please squash commits.

@HEM42 HEM42 force-pushed the restore_previous_repair_functionality branch from 27aefba to 0ab04c2 Compare May 8, 2024 06:20
@jberger
Copy link
Member

jberger commented May 8, 2024

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies
to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A
that's my reading of that query anyway, can you confirm?

@jberger
Copy link
Member

jberger commented May 8, 2024

Oh, it seems I misremembered the previous functionality. I had thought it would preserve the whole chain but the query does not appear to be so cb9182a

@christopherraa
Copy link
Member

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

I had the exact same question, so made a little example for my own sake, just to map it out. Job 1 creates 2, 3, 4 and sets itself as parent for those. It also creates job 5 which is to be run when 2, 3 and 4 completes. Something like this:

   +---> 2 ---+
1 -|---> 3 ---|-> 5
   +---> 4 ---+

After running this is the state:

1:
    parents = []
    state   = finished
2:
    parents = [1]
    state   = finished
3:
    parents = [1]
    state   = failed
4:
    parents = [1]
    state   = finished
5:
    parents = [2,3,4]
    state   = inactive

Then how I read the query you'd for

SELECT id FROM minion_jobs WHERE state = 'finished' AND finished <= NOW() - INTERVAL '1 second' * ?

get 1, 2, 4 since those are the finished jobs, and for

EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished'

you'd get 1 from 3.parents and 2, 3, 4 from 5.parents. This should result in zero rows being deleted. At least how I understand it. Super if anyone else could sanity check this in case I have a bit of the dumb today.

What I have not verified is what happens when the chain is "deeper".

Copy link
Member

@christopherraa christopherraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this looks good and quite a nice feature to have. I'll leave it to someone else to make a decision to merge though.
👍

@HEM42
Copy link
Author

HEM42 commented May 8, 2024

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

This is correct, A would be removed in this case. But this is how the previous query worked.

Main focus was to restore previous functionality.

@HEM42
Copy link
Author

HEM42 commented May 8, 2024

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

I had the exact same question, so made a little example for my own sake, just to map it out. Job 1 creates 2, 3, 4 and sets itself as parent for those. It also creates job 5 which is to be run when 2, 3 and 4 completes. Something like this:

   +---> 2 ---+
1 -|---> 3 ---|-> 5
   +---> 4 ---+

You would actually need to have something like

   +---> 2 ---+
1 -|---> 3 ---|-> 5
   |---> 4 ---|
   +----------+

In order to preserve the flow, atleast until 5 has finished

@christopherraa
Copy link
Member

@HEM42 Ah! Good thing you clarified then, as I had misunderstood / misread the logic. Thank you. I think I'll have to consider myself neutral then, at least until I've pondered on the whole "delete parts of the tree" thing here. If 1 is deleted I think at least this should be documented so that it is clear to the users how this deletion works and how relationships needs to be set up to avoid partial deletes. It might be documented somewhere already if this is the previous functionality, but I cannot remember seeing it.

@christopherraa
Copy link
Member

@HEM42 Another follow up on this, as I couldn't get your explaination to make sense after reading the SQL again. So I basically made this minimal example:

create table minion_jobs(id integer, state text, parents integer[]);
insert into minion_jobs(id, state, parents) values 
     (1, 'finished', null),
     (2, 'finished', '{1}'),
     (3, 'failed', '{1}'),
     (4, 'finished', '{1}'),
     (5, 'inactive', '{2,3,4}');

when running a slightly modified query (skipping the timestamp for convenience here) gives this result:

SELECT id FROM minion_jobs WHERE state = 'finished' EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished';
┌────┐
│ id │
├────┤
└────┘
(0 rows)

What am I missing?

@HEM42
Copy link
Author

HEM42 commented May 8, 2024

@HEM42 Ah! Good thing you clarified then, as I had misunderstood / misread the logic. Thank you. I think I'll have to consider myself neutral then, at least until I've pondered on the whole "delete parts of the tree" thing here. If 1 is deleted I think at least this should be documented so that it is clear to the users how this deletion works and how relationships needs to be set up to avoid partial deletes. It might be documented somewhere already if this is the previous functionality, but I cannot remember seeing it.

The docs for remove_after states

Amount of time in seconds after which jobs that have reached the state finished and have no unresolved dependencies will be removed automatically by "repair", defaults to 172800 (2 days). It is not recommended to set this value below 2 days.

Which was the reason for #129 in the first place, and we think we have good solution to bring it back.

@HEM42
Copy link
Author

HEM42 commented May 9, 2024

@HEM42 Another follow up on this, as I couldn't get your explaination to make sense after reading the SQL again. So I basically made this minimal example:

create table minion_jobs(id integer, state text, parents integer[]);
insert into minion_jobs(id, state, parents) values 
     (1, 'finished', null),
     (2, 'finished', '{1}'),
     (3, 'failed', '{1}'),
     (4, 'finished', '{1}'),
     (5, 'inactive', '{2,3,4}');

when running a slightly modified query (skipping the timestamp for convenience here) gives this result:

SELECT id FROM minion_jobs WHERE state = 'finished' EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished';
┌────┐
│ id │
├────┤
└────┘
(0 rows)

What am I missing?

Not missing anything, but if you where to not fail job with id 3. You would then see

create table minion_jobs(id integer, state text, parents integer[]);
insert into minion_jobs(id, state, parents) values
     (1, 'finished', null),
     (2, 'finished', '{1}'),
     (3, 'finished', '{1}'),
     (4, 'finished', '{1}'),
     (5, 'inactive', '{2,3,4}');

SELECT id FROM minion_jobs WHERE state = 'finished' EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished';
 id
----
  1
(1 row)

@HEM42
Copy link
Author

HEM42 commented May 16, 2024

any updates on this ?

is there need for more information in order to take a decision

@kraih kraih requested review from a team, jhthorsen and jberger May 16, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants