Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Persistent Email Reminders for Producers of Datasets #1248

Open
anushka-singh opened this issue May 3, 2024 · 8 comments
Open

Comments

@anushka-singh
Copy link
Contributor

Implement persistent email notifications to remind producers to approve or extend share requests that they dont respond to in time.

We want to be able to nudge producers of datasets to take actions if they have been pending on their ledger for a while.

Extend this feature's usage for expiration of shares.

@anushka-singh
Copy link
Contributor Author

Please review this design and help answer pending questions at the bottom:

Objective:

The objective of this feature is to implement persistent email notifications to remind dataset producers to approve/reject share requests that they have not responded to in a timely manner. The aim is to streamline the share request workflow and ensure that dataset producers remain engaged and responsive to pending requests.

Overview:

When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets.

Technical Implementation:

Backend Logic:
Lambda: Implement background task scheduler logic for periodic checks of pending share requests. This lambda will create a list of all requests that have been pending for the last "x" days.
RDS: Add a new table to store metadata related to share requests and email reminders, including timestamps and delivery status.
Current EMAIL workflow: Integrate current workflow to send automated email reminders to dataset producers for pending share requests.

Database Schema:
RDS TABLES: To include fields for tracking request status and timestamps.

Workflow:

  1. Share request initiated by a user for a dataset.
  2. Dataset producer notified of the pending share request.
  3. If the producer does not respond within a specified timeframe:
    • Automated email reminders are sent at predefined intervals.
  4. Producer receives email reminders prompting action on the pending share request.
  5. Producer approves or extends the share request via email or through the application interface.
  6. Share request status updated accordingly in RDS.

Pending Questions:

  1. Should we only send emails to producers when request has been pending for long or also to requesters?
  2. Should consumers be notified to resend share request if share failed? I am not sure this is necessary, but what do you think?
  3. Should consumers be notified when verify share action puts share into unhealthy state and they have not taken any action?
  4. Should the frequency of emails be configurable? 1 week vs 1 day etc in config.json?
  5. What should default frequency be?
  6. I think we should have a setting for this in config.json. Users can choose if they want persistent emails it to be enabled or not. Not everyone might want nagging emails.
  7. Is creating a new table in RDS called "persistent-email-reminders" where we store email reminders, timestamps and delivery status needed? And, should we save every single message sent out? if we do, we might have to think about a way to delete the entries after some time - like 3 or 6 months etc.

@zsaltys
Copy link
Contributor

zsaltys commented May 10, 2024

Should we only send emails to producers when request has been pending for long or also to requesters?

Only send reminders to producers. Our goal is to make sure they are reminded they need to approve.

Should consumers be notified to resend share request if share failed

Should be a separate task to notify ONCE the requester and producer team that the share failed

Should consumers be notified when verify share action puts share into unhealthy state and they have not taken any action

Separate task but I believe if share health verifier finds that a share becomes unhealthy it should start sending reminders.

Should the frequency of emails be configurable

If easy to do then yes. Default should be daily. We should probably also let users define the time of day when notifications are sent. For us it may be early EU hours but for someoene else in Asia they might want to choose different hours it also depends what Timezone the reminder service runs with. Perhaps the service should also allow to configure the timezone + time of day.

Not everyone might want nagging emails

Agree. We can allow multiple strategies: single_reminder, repeating_reminder

Do we need a new RDS table?

I think we should keep it as simple as possible and for share reminders I don't think we need a new table. We can see which shares are pending approval and can figure out if they need a reminder or not. I don't see a benefit to a new table to track reminders.

Additionally:

We should not design this service just for share reminders. There's many things we should start sending nagging reminders:

  1. failing dataset stack
  2. failing environment stack
  3. share becomes unhealthy and cannot be fixed

My thinking would be that we have some interface of reminder strategies let's call it iReminderStrategy and we have ReminderService which would kinda work like ReminderService.remind(reminders_strategies: List[iReminderStrategy]) and we can have multiple implementations of iReminderStrategy like: FailingStackReminderStrategy, UnhealthyShareReminderStategy, PendingShareReminderStrategy..

For this task I would focus just on building generic support for reminders and implementing PendingShareReminderStrategy

Also we should if possible use some sort of iNotifier interface so that reminders could be sent to multiple places like slack channels, alerting systems, email etc.. For now just having email notifier would be enough.

@TejasRGitHub
Copy link
Contributor

Hi @anushka-singh ,

Is creating a new table in RDS called "persistent-email-reminders" where we store email reminders, timestamps and delivery status needed? And, should we save every single message sent out? if we do, we might have to think about a way to delete the entries after some time - like 3 or 6 months etc.

I don't think we would need another RDS table for storing information about statuses of email notification. I think we can leverage the activity table to store when the email notification were sent. Maybe we can also log if any email notification failed / some exception occured in the activity table.

I think we should have a setting for this in config.json. Users can choose if they want persistent emails it to be enabled or not. Not everyone might want nagging emails.

I agree we should have this config. We can use the current email notification config and add this config

Regarding comment from Zilvinas,

If easy to do then yes. Default should be daily. We should probably also let users define the time of day when notifications are sent. For us it may be early EU hours but for someoene else in Asia they might want to choose different hours it also depends what Timezone the reminder service runs with. Perhaps the service should also allow to configure the timezone + time of day.

I agree on having a daily schedule to sent reminder emails ( for shares requests, stack failures, etc). Making it configurable based on timezone might involve more complex design ( for e.g. like creating an event which is triggered based on datetime which then calls lambda to execute the share request notification email) I think we can iterate on this in stages and keep it simple for now.

Currently, email notifications for share notifications are sent via AWS workers lambda. For this we could use an scheduled ECS task for trigger email notifications for various strategies like Zilvinas mentioned ( " FailingStackReminderStrategy, UnhealthyShareReminderStategy, PendingShareReminderStrategy " ) OR create a recurring event which triggers lambda to send email notification but I am not sure if the 900 sec Runtime will be enough.

One suggestion on the subject of email reminders, we should have something which notifies the user to take actions. Something like : [ACTION REQUIRED] Share Request pending for dataset -

@anushka-singh
Copy link
Contributor Author

I agree with Zi's suggestion for the long term. It will be good to have a plug and play model like: ReminderStrategy and ReminderService which would kinda work like ReminderService.remind(reminders_strategies: List[iReminderStrategy])

I have a question that maybe @noah-paige @dlpzx can best answer: For this particular design strategy, should I create a new module called persistent_notifications? Or do you just see this be a part of the current notification modules?

@noah-paige
Copy link
Contributor

Not to repeat every thing that is said above but I agree with most (if not all) of the points:

  • SharePending reminders for approvers only
  • ShareFailed or UnhealthyShare Notifications can be easy enhancements once we iron out how we want to design reminder notifications (for now focus on reminders of pending shares)
  • Daily Reminders by default
    • If easy can have reminders configurable and on a given time of day / given timezone (maybe easiest to had dataall admins specify a given time in a standard timezone - i.e. UTC)
    • Keep reminders app specific for now --> Can look into reminder notification preferences by team or by share object as an enhancement story
  • Have a config.json setting sure - I would say have single_reminder or repeating_reminder but options for share_notifications separate from email active or not
  • For RDS information I am not sure we need additional tables
    • We do track both created and updated timestamps for records in the share_object table - should we just use this existing info along with Share Object Status to identify share objects that require additional email notifications
    • We can use existing notification table to persist additional notifications (if needed) - unless there is some added benefit with creating a new table altogether in RDS?

For how to design notification system in the long run (not sure if this needs to be prioritized for this particular feature request but potentially) - the below is a WIP similar a bit to how @zsaltys is thinking but just trying to get some ideas out there...

The issues I see with Notifications currently:

  1. They are too intertwined with Share Notifications only at the moment
  2. They are too hard coded to only support default Notification creation + supported email Notification if configured

Ideally we have a Notification system that can create Notifications of any type and delivery them through any channel without having to edit the foundational Notification Module

  • For Creating/Registering Notifications

    • Different Modules can create different notifications, for example
      • Pending Share Notification for Dataset Sharing Module
      • Unhealthy Share Notification for Dataset Sharing Module
      • (Future Idea) New Dataset Metrics for Dataset Module
      • Etc.
    • Maybe each Notification Type should be a standalone class which inherits from a common interface (from foundational notificatuon module) and have similar properties, including:
      • A create/register notification method in RDS
      • A reminder configuration (enabled/disabled)
      • A subject line header and message creation specific to the notification type
    • Rather than using enum DataSharingNotificationType
  • For Delivery Mechanisms on Notifications

    • When a notification needs to be sent out we need a way to fan out the notification delivery across different registered channels (i.e. dataall UI, email, SMS (future idea), Slack (future idea) )
    • Can design similar to how it is done with EnvironmentResourceManger and EnvironmentResource in backend/dataall/core/environment/services/environment_resource_manager.py
      • Have a class NotificationDeliverManager with a _delivery_channels property of a list of DeliveryChannel class objects
      • Have each delivery channel inherit from DeliveryChannel and when sending a new notification - NotificationDeliverManager can iterate through the registered channels and send() the notification if applicable

@dlpzx
Copy link
Contributor

dlpzx commented May 15, 2024

I agree with all the points above. The most important decision points that I would remark:

  • RDS tables: if this metadata is not going to be used in the application or for audit it does not make sense to store all the info about the notification
  • Interface for Notifications and structure in the repo: I would create a generic notifications module with the interface. That's for sure. From here we have 2 options:
    • either we import the notifications module inside the top-modules such as dataset_sharing and implement the notification there
    • or we import notifications from a new module dataset_sharing_notificationsand then import dataset_sharing_notifications in dataset_sharing`

In my opinion it might be an overkill to go for the second option. But I am not strong opinionated

@anushka-singh
Copy link
Contributor Author

Design after meeting on 5/15 with @dlpzx and @noah-paige

  • The overall re-design of the notification task is out of scope for this persistent email reminders task. That will require significant redesign of the system and multiple design reviews.
  • This task will ONLY be for persistent email reminders when Producer has not taken an action on a share.

Wrapping up thoughts and comments here:

  • Send reminders only to dataset producers to ensure they are prompted to take action.
  • Notify both the requester and producer team once if a share fails, rather than sending multiple reminders.
  • There's consensus on making the frequency of reminders configurable, with the default frequency suggested to be daily.
  • Not everyone might want persistent reminders, so having a setting in config.json to enable or disable them is proposed.

Design proposal:

  • We can use existing notification table to persist additional notifications (if needed) - unless there is some added benefit with creating a new table altogether in RDS? - What additional info do we want to store in notifications table?

    • RDS design:
      • Use existing notifications table with no extra columns
      • To generate counter of how many times somebody has been notified, use concat of groupBy on shareUri, EmailRecipient, share status (PENDING). Use this counter to let producer know in email how many times they already have been reminded before.
      • Notifications table will still serve a singular purpose and will be generic. We will not have new columns for email notification vs UI notifications etc.
      • Use 2 preexisting tables for this task: share_object and notifications table
  • For time zone strategy - Not convinced we need it? We anyway have so many notifications email or otherwise when we wake up in the morning. It does not disturb us and we dont go looking for it unless we voluntarily want to.

    • Agreed that this is a nice-to-have and need not be implemented necessarily. We could possibly just give a time for this job to run in UTC in the config.json
  • How will config.json look?

Screenshot 2024-05-15 at 4 58 48 PM
{
  "share_notifications": {
    "reminder": {
      "active": true,
      "parameters": {
        "group_notifications": true,
        "reminder_frequency": "0 9 * * *", // Cron job schedule (runs every day at 9:00 AM)
      }
    }
  }
}

The cron job expression "0 9 * * 1" means the reminder will be triggered at 9:00 AM every Monday (1 represents Monday in the cron syntax). This will ensure that the reminder runs weekly on the specified day and time.

If you want the reminder to be triggered on a different day of the week, you can adjust the value after the day of the week field (1 in this case) accordingly. For example, 0 9 * * 2 for Tuesday, 0 9 * * 3 for Wednesday, and so on, up to 7 for Sunday.

  • Lambda: Implement background task scheduler logic for periodic checks of pending share requests. This lambda will create a list of all requests that have been pending for the last "x" days. We have a couple of strategies here:

Option 1. EventBridge + SQS + Trigger a Lambda (ECS Task)

  • EventBridge: Use EventBridge to schedule periodic events. Configure EventBridge rules to trigger an SQS queue at a specified interval.
  • SQS (Simple Queue Service): Create an SQS queue to store pending share requests.
  • Lambda (ECS Task): Set up a Lambda function triggered by messages in the SQS queue. This Lambda function can perform the task of fetching pending share requests and processing them. If the processing involves long-running tasks, can trigger an ECS task from the Lambda function to handle the processing.

Option 2: Reuse AWS Worker Lambda

  • We already have an AWS Worker Lambda for Email notifications currently. We can extend its functionality to include periodic checks for pending share requests and sending out email + UI notification reminders.

Option 3: Create Brand New Lambda

  • We can develop a dedicated Lambda function specifically for fetching pending share requests and sending notifications. Could use eventbridge to trigger lambda to run at desired interval.

@anmolsgandhi anmolsgandhi added this to To do in v2.6.0 via automation May 16, 2024
@anmolsgandhi anmolsgandhi moved this from To do to In progress in v2.6.0 May 16, 2024
@noah-paige
Copy link
Contributor

above design looks good - for implementation of the backend logic, encourage to re-use the code we have already in place to send email to also send the reminder emails

I think in our talk Option1 and Option2 are the same - adding an EventBridge trigger and re-using the existing SQS Queue and worker lambda to handle email notifications

But the more I think of it and with the max execution time constraints it may be easiest just to to spin up a brand new ECS Task (you can schedule task definitions to run at given times like how we've done with share-verifier or stack-updater ecs tasks)

And the code to send email notifications I think is bundled into both ECS and lambda container images so should be good to use in either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
v2.6.0
In progress
Development

No branches or pull requests

5 participants