New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor latest task runs per CLI #146
Merged
Merged
Changes from 6 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
e33728d
Create draft PR for #119
nhoening 03fb769
fix type mismatch when queueing forecasting jobs from CLI
nhoening ba6301e
Merge branch 'main' into issue-119-Start_using_Sentry_in_our_general_…
nhoening 2394c1d
add sentr SDK as dependency and use it if SENTRY_DSN is set
nhoening 1de6ab2
montitor latest task runs per CLI and send any alerts to Sentry and e…
nhoening 4037b39
simpler alert titles (for better grouping in Sentry), adding latest r…
nhoening c7f5f98
Merge branch 'main' into montitor-tasks-per-CLI
nhoening c9b184f
make monitoring setting a list and give it a better name
nhoening 55bfd1f
add changelog entry
nhoening 46977d4
simplify recency check
nhoening File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
from datetime import timedelta | ||
from typing import Optional | ||
|
||
import click | ||
from flask import current_app as app | ||
from flask.cli import with_appcontext | ||
from flask_mail import Message | ||
from sentry_sdk import ( | ||
capture_message as capture_message_for_sentry, | ||
set_context as set_sentry_context, | ||
) | ||
|
||
from flexmeasures.data.models.task_runs import LatestTaskRun | ||
from flexmeasures.utils.time_utils import server_now | ||
|
||
|
||
@click.group("monitor") | ||
def fm_monitor(): | ||
"""FlexMeasures: Monitor tasks.""" | ||
|
||
|
||
def send_monitoring_alert( | ||
task_name: str, msg: str, latest_run: Optional[LatestTaskRun] = None | ||
): | ||
""" | ||
Send any monitoring message per Sentry and per email. Also log an error. | ||
""" | ||
latest_run_txt = "" | ||
if latest_run: | ||
set_sentry_context( | ||
"latest_run", {"time": latest_run.datetime, "status": latest_run.status} | ||
) | ||
latest_run_txt = ( | ||
f"Last run was at {latest_run.datetime}, status was: {latest_run.status}" | ||
) | ||
|
||
capture_message_for_sentry(msg) | ||
|
||
email_recipients = app.config.get("MAIL_MONITORING_RECIPIENTS", "").split(",") | ||
if len(email_recipients) > 0: | ||
email = Message(subject=f"Problem with task {task_name}", bcc=email_recipients) | ||
email.body = f"{msg}\n\n{latest_run_txt}\nWe suggest to check the logs." | ||
app.mail.send(email) | ||
|
||
app.logger.error(f"msg {latest_run_txt}") | ||
|
||
|
||
@fm_monitor.command("tasks") | ||
@with_appcontext | ||
@click.option( | ||
"--task", | ||
type=(str, int), | ||
multiple=True, | ||
required=True, | ||
help="The name of the task and the maximal allowed minutes between successful runs. Use multiple times if needed.", | ||
) | ||
def monitor_tasks(task): | ||
""" | ||
Check if the given task's last successful execution happened less than the allowed time ago. | ||
If not, alert someone, via email or sentry. | ||
""" | ||
for t in task: | ||
task_name = t[0] | ||
app.logger.info(f"Checking latest run of task {task_name} ...") | ||
latest_run: LatestTaskRun = LatestTaskRun.query.get(task_name) | ||
if latest_run is None: | ||
msg = f"Task {task_name} has no last run and thus cannot be monitored. Is it configured properly?" | ||
send_monitoring_alert(task_name, msg) | ||
return | ||
now = server_now() | ||
acceptable_interval = timedelta(minutes=t[1]) | ||
if ( | ||
now - acceptable_interval | ||
<= latest_run.datetime | ||
<= now + acceptable_interval | ||
): | ||
# last time is okay, let's check the status | ||
if latest_run.status is False: | ||
msg = f"A failure has been reported on task {task_name}." | ||
send_monitoring_alert(task_name, msg, latest_run) | ||
else: | ||
msg = ( | ||
f"Task {task_name}'s latest run time is outside of the acceptable range " | ||
f"({acceptable_interval})." | ||
) | ||
app.logger.error(msg) | ||
send_monitoring_alert(task_name, msg, latest_run) | ||
app.logger.info("Done checking task runs ...") | ||
|
||
|
||
app.cli.add_command(fm_monitor) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the
now + acceptable_interval
supposed to do? Doesn't seem documented.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I got this from Bobby's code. It checks if the latest run time isn't too far from the monitor's now perspective.
Well, re-thinking it, I am actually not sure this makes sense. This date can practically happen to be in the future from the monitor's perspective, but if the clocks of monitoring server and task-executing server are out of sync, that should become a different kind of warning I guess. Then the allowed interval isn't a good measurement, as we don't know how much we are out of bounds.
AAAnyway ― in our case the same server is executing and monitoring so I'll not do that extra warning. I'll simply remove the future check.