-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data integrity check for cron task #3769
Conversation
f5674b6
to
a6307db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good but would be really good to explore the possibilities of getting the cron job frequency dynamically so that we reduce the chance of getting out of sync with checks in case someone adjusts the tasks frequency.
Also is there a better word to label a task as a task that didn't run but it was supposed to? You used "stopped" but that can also indicate someone stopped it intentionally. I'd prefer maybe "lagging"
@jurgenwerk this was perhaps more involved than I wanted to. I had to parse the way the graphile worker encoded
This exercise also made me realise that I was perceiving |
return r; | ||
} | ||
isType(item: ParsedCronItem): item is MinuteIntervalLessThanHour { | ||
if (item.hours.length !== 24 || item.dows.length !== 7 || item.months.length !== 12 || item.dates.length !== 31) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the logic in this function is a bit hard to follow for me. I think it went a bit too far in complexity
Have you considered parsing the cron string, get the interval for each entry, and simply compare whether it has been executed in the last interval + some padding? Or am I missing something?
Looks like ChatGPT can help to spit out the function for that:
function calculateCronInterval(expression) {
const fields = expression.split(' ');
const [minute, hour, dayOfMonth, month, dayOfWeek] = fields;
function processField(field, min, max) {
if (field === '*') {
return max - min + 1;
}
const values = field.split(',').map(value => parseInt(value));
return Math.min(...values.map(value => max - min + 1));
}
const minutesInterval = processField(minute, 0, 59);
const hoursInterval = processField(hour, 0, 23);
const daysInterval = processField(dayOfMonth, 1, 31);
const monthsInterval = processField(month, 1, 12);
const weekdaysInterval = processField(dayOfWeek, 0, 6);
// Calculate the overall interval in minutes
const overallInterval = minutesInterval * hoursInterval * daysInterval * monthsInterval * weekdaysInterval;
return overallInterval;
}
const cronExpression = "*/15 * * * *"; // Example cron expression
const intervalInMinutes = calculateCronInterval(cronExpression);
console.log(`Interval in minutes: ${intervalInMinutes}`);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. I have over-abstracted at the expense of clarity. I can just parse the crontab string and simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simplified everything. This function above is wrong.
I think we prob don't want to model ENTIRE graphile workers crontab string.
The main case we consider is just the difference */15 * * * *
and 0 5 * * *
. Former is every 15 minutes. Latter is every day at 5am UTC. For our purposes, we just perceive these two cases as every 15 minutes and every 24*60 minutes (1 day) from last execution.
That means that I write the code CONSERVATIVELY in that if you write something */15 15 * * *
. Which means every 15 minutes within the 15th hour of the day. I expect to just not parse this and error out. This keeps our code exact but incomplete.
it('calculateMinuteInterval', async function () { | ||
expect(calculateMinuteInterval('0 5 * * * remove-old-sent-notifications ?max=5').minuteInterval).to.equal( | ||
60 * 24 | ||
); | ||
expect(calculateMinuteInterval('*/5 * * * * print-queued-jobs').minuteInterval).to.equal(5); | ||
expect(() => calculateMinuteInterval('0 5 2 * * remove-old-sent-notifications ?max=5')).to.throw( | ||
'Cannot parse the provided cron expression: 0 5 2 * * remove-old-sent-notifications ?max=5' | ||
); | ||
expect(() => calculateMinuteInterval('*/5 */3 * * * print-queued-jobs')).to.throw( | ||
'Cannot parse the provided cron expression: */5 */3 * * * print-queued-jobs' | ||
); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote test here
This adds a data integrity check that checks when a graphile worker task was last fired. If the task runs every 5 minutes, we give it an allowances of 5*3 = 15 minutes. Right now every task uses 3x multiplier but this tolerance can be adjusted for a specific task if needed
I didn't use prisma. Well because graphile_worker tables exist in the graphile_worker namespace and typically prisma schema file only allow to specify only one namespace