Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify any possible performance improvements to be had with the time-based processing #910

Open
3 tasks
ccostino opened this issue Apr 9, 2024 · 1 comment
Assignees

Comments

@ccostino
Copy link
Contributor

ccostino commented Apr 9, 2024

In an effort to help alleviate the current issues we're having with larger time-based reports being generated - they will timeout and the download request will fail before the report is generated - we'd like to see if there are any performance improvements that could be made.

While this work will be for a shorter term fix while we continue discussing and planning what a long-term fix and improvement looks like, any performance gains and improvements to the processing that happens now will help the effort in the future as well.

Implementation Sketch and Acceptance Criteria

  • Review the current time-based report generation code
    • Walk through the flow of events and processing and see what's currently happening to understand the complete picture
    • Note any areas of the code that are retrieving data and/or looping through data - these are likely targets for the biggest performance gains
    • Is there a better way of retrieving data to process it in memory more efficiently?
  • Make adjustments to the processing where possible and test/benchmark for changes
  • Adjust tests as necessary and make sure they still pass

Security Considerations

  • We're doing this because we've been removing PII from the system, but we still need the site to function properly and perform well for user tasks.
@terrazoon
Copy link
Contributor

terrazoon commented Apr 25, 2024

I did some profiling locally after doing 50 one-off messages:

the downloads (if the job is not in the cache) take anywhere from 250 milliseconds to 1400 and average out at about 400, which is 90% of the time the report needs to generate.
So there is no optimization available in the code itself. I think the options might be:

  • increase the cache time to 7 days (hmmm)
  • increase the cache time to 1 day and make a one day report with phone numbers, then remove phone numbers from the 7 day report
  • just remove the phone numbers from the 7 day report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Blocked/Waiting
Development

No branches or pull requests

2 participants