Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor users by time since last request #541

Merged
merged 17 commits into from Dec 9, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions documentation/changelog.rst
Expand Up @@ -14,6 +14,7 @@ New features
* Ability to provide your own custom scheduling function [see `PR #505 <http://www.github.com/FlexMeasures/flexmeasures/pull/505>`_]
* Visually distinguish forecasts/schedules (dashed lines) from measurements (solid lines), and expand the tooltip with timing info regarding the forecast/schedule horizon or measurement lag [see `PR #503 <http://www.github.com/FlexMeasures/flexmeasures/pull/503>`_]
* The asset page also allows to show sensor data from other assets that belong to the same account [see `PR #500 <http://www.github.com/FlexMeasures/flexmeasures/pull/500>`_]
* The CLI command ``flexmeasures monitor latest-login`` supports to check if users who are expected to contact FlexMeasures regularly (e.g. to send data) fail to do so [see `PR #519 <http://www.github.com/FlexMeasures/flexmeasures/pull/541>`_]
nhoening marked this conversation as resolved.
Show resolved Hide resolved
* The CLI command ``flexmeasures show beliefs`` supports showing beliefs data in a custom resolution and/or timezone, and also saving the shown beliefs data to a CSV file [see `PR #519 <http://www.github.com/FlexMeasures/flexmeasures/pull/519>`_]
* Improved import of time series data from CSV file: 1) drop duplicate records with warning, 2) allow configuring which column contains explicit recording times for each data point (use case: import forecasts) [see `PR #501 <http://www.github.com/FlexMeasures/flexmeasures/pull/501>`_], 3) localize timezone naive data, 4) support reading in datetime and timedelta values, 5) remove rows with NaN values, and 6) filter by values in specific columns [see `PR #521 <http://www.github.com/FlexMeasures/flexmeasures/pull/521>`_]

Expand Down
8 changes: 4 additions & 4 deletions documentation/host/error-monitoring.rst
Expand Up @@ -6,13 +6,13 @@ Error monitoring
When you run a FlexMeasures server, you want to stay on top of things going wrong. We added two ways of doing that:

- You can connect to Sentry, so that all errors will be sent to your Sentry account. Add the token you got from Sentry in the config setting :ref:`sentry_access_token` and you're up and running!
- Another source of crucial errors are things that did not even happen! For instance, a task to import prices from a day-ahead market, which you depend on later for scheduling. We added a new CLI task called ``flexmeasures monitor tasks``, so you can be alerted when tasks have not successfully run at least so-and-so many minutes ago. The alerts will also come in via Sentry, but you can also send them to email addresses with the config setting :ref:`monitoring_mail_recipients`.
- Another source of crucial errors are things that did not even happen! For instance, a task to import prices from a day-ahead market, which you depend on later for scheduling. We added a new CLI task called ``flexmeasures monitor latest-run``, so you can be alerted when tasks have not successfully run at least so-and-so many minutes ago. The alerts will also come in via Sentry, but you can also send them to email addresses with the config setting :ref:`monitoring_mail_recipients`.
nhoening marked this conversation as resolved.
Show resolved Hide resolved

For illustration of the latter monitoring, here is one example of how we monitor tasks on a server ― the below is run in a cron script every hour and checks if every listed task ran 60, 6 or 1440 minutes ago, respectively:
For illustration of the latter monitoring, here is one example of how we monitor the latest run times of tasks on a server ― the below is run in a cron script every hour and checks if every listed task ran 60, 6 or 1440 minutes ago, respectively:

.. code-block:: console

flexmeasures monitor tasks --task get_weather_forecasts 60 --task get_recent_meter_data 6  --task import_epex_prices 1440
flexmeasures monitor latest-run --task get_weather_forecasts 60 --task get_recent_meter_data 6  --task import_epex_prices 1440

The first task (get_weather_forecasts) is actually supported within FlexMeasures, while the other two sit in plugins we wrote.

Expand All @@ -24,7 +24,7 @@ This task status monitoring is enabled by decorating the functions behind these
def my_function():
...

Then, FlexMeasures will log if this task ran, and if it succeeded or failed. The result is in the table ``latest_task_runs``, and that's where the ``flexmeasures monitor tasks`` will look.
Then, FlexMeasures will log if this task ran, and if it succeeded or failed. The result is in the table ``latest_task_runs``, and that's where the ``flexmeasures monitor latest-run`` will look.

.. note:: The decorator should be placed right before the function (after all other decorators).

Expand Down
14 changes: 12 additions & 2 deletions flexmeasures/auth/__init__.py
@@ -1,6 +1,6 @@
from flask import Flask
from flask_security import Security, SQLAlchemySessionUserDatastore
from flask_login import user_logged_in
from flask_login import user_logged_in, current_user
from werkzeug.exceptions import Forbidden, Unauthorized

from flexmeasures.data import db
Expand All @@ -21,7 +21,12 @@ def register_at(app: Flask):
unauthorized_handler,
unauthorized_handler_e,
) # noqa: F401
from flexmeasures.data.models.user import User, Role, remember_login # noqa: F401
from flexmeasures.data.models.user import (
User,
Role,
remember_login,
remember_last_seen,
) # noqa: F401

# Setup Flask-Security-Too for user authentication & authorization
user_datastore = SQLAlchemySessionUserDatastore(db.session, User, Role)
Expand All @@ -39,3 +44,8 @@ def register_at(app: Flask):

# add our custom handler for a user login event
user_logged_in.connect(remember_login)

# also store when the last contact was
@app.before_request
def record_last_seen():
remember_last_seen(current_user)
186 changes: 179 additions & 7 deletions flexmeasures/cli/monitor.py
@@ -1,5 +1,5 @@
from datetime import timedelta
from typing import Optional
from typing import Optional, List

import click
from flask import current_app as app
Expand All @@ -11,6 +11,7 @@
)

from flexmeasures.data.models.task_runs import LatestTaskRun
from flexmeasures.data.models.user import User
from flexmeasures.utils.time_utils import server_now


Expand All @@ -19,7 +20,7 @@ def fm_monitor():
"""FlexMeasures: Monitor tasks."""


def send_monitoring_alert(
def send_task_monitoring_alert(
task_name: str,
msg: str,
latest_run: Optional[LatestTaskRun] = None,
Expand Down Expand Up @@ -54,7 +55,7 @@ def send_monitoring_alert(
app.logger.error(f"{msg} {latest_run_txt} NOTE: {custom_msg}")


@fm_monitor.command("tasks") # TODO: a better name would be "latest-run"
@fm_monitor.command("tasks") # TODO: deprecate, this is the old name
@with_appcontext
@click.option(
"--task",
Expand All @@ -69,18 +70,48 @@ def send_monitoring_alert(
default="",
help="Add this message to the monitoring alert (if one is sent).",
)
def monitor_tasks(task, custom_message):
@click.pass_context
def monitor_task(ctx, task, custom_message):
"""
DEPRECATED, use `latest-run`.
nhoening marked this conversation as resolved.
Show resolved Hide resolved
Check if the given task's last successful execution happened less than the allowed time ago.
If not, alert someone, via email or sentry.
"""
click.echo(
"This function has been renamed (and is now deprecated). Please use flexmeasures monitor latest-run."
)
ctx.forward(monitor_latest_run)


@fm_monitor.command("latest-run")
@with_appcontext
@click.option(
"--task",
type=(str, int),
multiple=True,
required=True,
help="The name of the task and the maximal allowed minutes between successful runs. Use multiple times if needed.",
nhoening marked this conversation as resolved.
Show resolved Hide resolved
)
@click.option(
"--custom-message",
type=str,
default="",
help="Add this message to the monitoring alert (if one is sent).",
)
def monitor_latest_run(task, custom_message):
"""
Check if the given task's last successful execution happened less than the allowed time ago.

Tasks are CLI commands with the @task_with_status_report decorator.
If not, alert someone, via email or sentry.
"""
for t in task:
task_name = t[0]
app.logger.info(f"Checking latest run of task {task_name} ...")
latest_run: LatestTaskRun = LatestTaskRun.query.get(task_name)
if latest_run is None:
msg = f"Task {task_name} has no last run and thus cannot be monitored. Is it configured properly?"
send_monitoring_alert(task_name, msg, custom_msg=custom_message)
send_task_monitoring_alert(task_name, msg, custom_msg=custom_message)
return
now = server_now()
acceptable_interval = timedelta(minutes=t[1])
Expand All @@ -89,18 +120,159 @@ def monitor_tasks(task, custom_message):
# latest run time is okay, let's check the status
if latest_run.status is False:
msg = f"A failure has been reported on task {task_name}."
send_monitoring_alert(
send_task_monitoring_alert(
task_name, msg, latest_run=latest_run, custom_msg=custom_message
)
else:
msg = (
f"Task {task_name}'s latest run time is outside of the acceptable range"
f" ({acceptable_interval})."
)
send_monitoring_alert(
send_task_monitoring_alert(
task_name, msg, latest_run=latest_run, custom_msg=custom_message
)
app.logger.info("Done checking task runs ...")


def send_lastseen_monitoring_alert(
users: List[User],
last_seen_delta: timedelta,
alerted_users: bool,
account_role: Optional[str] = None,
user_role: Optional[str] = None,
):
"""
Tell monitoring recipients and Sentry about user(s) we haven't seen in a while.
"""
user_info_list = [
f"{user.username} (last contact was {user.last_seen_at})" for user in users
]

msg = (
f"The following user(s) have not contacted this FlexMeasures server for more"
f" than {last_seen_delta}, even though we expect they would have:\n"
)
for user_info in user_info_list:
msg += f"\n- {user_info}"

# Sentry
set_sentry_context(
"last_seen_context",
{
"delta": last_seen_delta,
"alerted_users": alerted_users,
"account_role": account_role,
"user_role": user_role,
},
)
capture_message_for_sentry(msg)

# Email
msg += "\n"
if account_role:
msg += f"\nThis alert concerns users whose accounts have the role '{account_role}'."
if user_role:
msg += f"\nThis alert concerns users who have the role '{user_role}'."
if alerted_users:
msg += "\n\nThe user(s) has/have been notified as well."
else:
msg += (
"\n\nThe user(s) has/have not been notified (--alert-users was turned off)."
)
email_recipients = app.config.get("FLEXMEASURES_MONITORING_MAIL_RECIPIENTS", [])
if len(email_recipients) > 0:
email = Message(
subject="Last contact by user(s) too long ago", bcc=email_recipients
)
email.body = msg
app.mail.send(email)

app.logger.error(msg)


@fm_monitor.command("last-seen")
@with_appcontext
@click.option(
"--maximum-minutes-since-last-seen",
type=int,
required=True,
help="Maximal minutes since last request.",
nhoening marked this conversation as resolved.
Show resolved Hide resolved
)
@click.option(
"--alert-users",
type=bool,
default=False,
help="If True, also send an email to the user. Defaults to False, as these users are often bots.",
)
@click.option(
"--account-role",
type=str,
help="The name of an account role to filter for.",
)
@click.option(
"--user-role",
type=str,
help="The name of an account role to filter for.",
nhoening marked this conversation as resolved.
Show resolved Hide resolved
)
def monitor_last_seen(
maximum_minutes_since_last_seen: int,
alert_users: bool = False,
account_role: Optional[str] = None,
user_role: Optional[str] = None,
):
"""
Check if given users last contact (via a request) happened less than the allowed time ago.

Helpful for user accounts that are expected to contact FlexMeasures regularly (in an automated fashion).
If the last contact was too long ago, we send alerts via Sentry, as well as emails to monitoring mail recipients.
The user can be informed, as well.

The set of users can be narrowed down by roles.
"""
last_seen_delta = timedelta(minutes=maximum_minutes_since_last_seen)

# find users we haven't seen in the given time window
users: List[User] = User.query.filter(
User.last_seen_at < server_now() - last_seen_delta
nhoening marked this conversation as resolved.
Show resolved Hide resolved
).all()

# role filters
if account_role is not None:
users = [user for user in users if user.account.has_role(account_role)]
if user_role is not None:
users = [user for user in users if user.has_role(user_role)]

if not users:
print(
f"All good ― no users were found with relevant criteria and last_seen_at longer than {maximum_minutes_since_last_seen} minutes ago."
)
return

# inform users & monitoring recipients
if alert_users:
nhoening marked this conversation as resolved.
Show resolved Hide resolved
for user in users:
msg = (
f"We noticed that user {user.username} has not been in contact with this FlexMeasures server"
f" for at least {maximum_minutes_since_last_seen} minutes (last contact was {user.last_seen_at})."
f"\nBy our own accounting, that gives reason for concern."
"\n\nPlease check if everything is working okay."
nhoening marked this conversation as resolved.
Show resolved Hide resolved
)
email = Message(
subject=f"Last contact by user {user.username} too long ago",
recipients=[user.email],
)
email.body = msg
app.mail.send(email)
else:
click.echo("Users are not being alerted.")

send_lastseen_monitoring_alert(
users,
last_seen_delta,
alerted_users=alert_users,
account_role=account_role,
user_role=user_role,
)


app.cli.add_command(fm_monitor)
@@ -0,0 +1,27 @@
"""new field last_seen_at in user model

Revision ID: 75f53d2dbfae
Revises: 650b085c0ad3
Create Date: 2022-11-27 00:15:26.403169

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = "75f53d2dbfae"
down_revision = "650b085c0ad3"
branch_labels = None
depends_on = None


def upgrade():
op.add_column("fm_user", sa.Column("last_seen_at", sa.DateTime(), nullable=True))
op.execute(
"update fm_user set last_seen_at = last_login_at where last_seen_at is null"
)


def downgrade():
op.drop_column("fm_user", "last_seen_at")
15 changes: 14 additions & 1 deletion flexmeasures/data/models/user.py
Expand Up @@ -17,6 +17,7 @@
)
from flexmeasures.data.models.parsing_utils import parse_source_arg
from flexmeasures.auth.policy import AuthModelMixin
from flexmeasures.utils.time_utils import server_now

if TYPE_CHECKING:
from flexmeasures.data.models.data_sources import DataSource
Expand Down Expand Up @@ -187,7 +188,11 @@ class User(db.Model, UserMixin, AuthModelMixin):
email = Column(String(255), unique=True)
username = Column(String(255), unique=True)
password = Column(String(255))
# Last time the user logged in (provided credentials to get access)
last_login_at = Column(DateTime())
# Last time the user made a request (authorized by session or auth token)
last_seen_at = Column(DateTime())
# How often have they logged in?
login_count = Column(Integer)
active = Column(Boolean())
# Faster token checking
Expand Down Expand Up @@ -258,12 +263,20 @@ def has_role(self, role: Union[str, Role]) -> bool:

def remember_login(the_app, user):
"""We do not use the tracking feature of flask_security, but this basic meta data are quite handy to know"""
user.last_login_at = datetime.utcnow()
user.last_login_at = server_now()
if user.login_count is None:
user.login_count = 0
user.login_count = user.login_count + 1


def remember_last_seen(user):
"""Update the last_seen field"""
if user is not None and user.is_authenticated:
user.last_seen_at = server_now()
db.session.add(user)
db.session.commit()


def is_user(o) -> bool:
"""True if object is or proxies a User, False otherwise.

Expand Down
1 change: 1 addition & 0 deletions flexmeasures/data/schemas/users.py
Expand Up @@ -27,3 +27,4 @@ def validate_timezone(self, timezone):
timezone = ma.auto_field()
flexmeasures_roles = ma.auto_field()
last_login_at = AwareDateTimeField()
last_seen_at = AwareDateTimeField()
7 changes: 3 additions & 4 deletions flexmeasures/ui/crud/users.py
Expand Up @@ -61,10 +61,9 @@ def process_internal_api_response(
role_ids = tuple(user_data.get("flexmeasures_roles", []))
user_data["flexmeasures_roles"] = Role.query.filter(Role.id.in_(role_ids)).all()
user_data.pop("status", None) # might have come from requests.response
if "last_login_at" in user_data and user_data["last_login_at"] is not None:
user_data["last_login_at"] = datetime.fromisoformat(
user_data["last_login_at"]
)
for date_field in ("last_login_at", "last_seen_at"):
if date_field in user_data and user_data[date_field] is not None:
user_data[date_field] = datetime.fromisoformat(user_data[date_field])
if user_id:
user_data["id"] = user_id
if make_obj:
Expand Down