Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] INBOX archival task improvement #859

Open
quantranhong1999 opened this issue Oct 24, 2023 · 2 comments
Open

[Experiment] INBOX archival task improvement #859

quantranhong1999 opened this issue Oct 24, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@quantranhong1999
Copy link
Member

Why

Today's behavior: iterating all user INBOX messages using Cassandra which put much pressure on Cassandra.

Using OpenSearch to query the date could avoid iterating all the messages which could bring faster response time (in most of the cases?).

Following Benoit's concern: OpenSearch may not be good for searching a big INBOX which searchs through a lot of shards, and OpenSearch is not a source of truth.

My proposal:

  • Solution 1: Mix of OpenSearch and Cassandra
    Rely on OpenSearch on average and small INBOX size e.g. < 100k messages, for big INBOX use Cassandra to not blow up OpenSearch.
    If OpenSearch is down or query timeout, fallback to Cassandra -> resilient upon OpenSearch failure.

  • Solution 2: OpenSearch first
    Rely firstly on OpenSearch for all INBOXes, only fallback to Cassandra upon OpenSearch failure.
    TODO benchmark in unit test/preprod to see if OpenSearch can handle all the pressure.

I feel that solution 1 could be a safer solution while still bringing improvement in task speed.

DoD

  • Benchmark on preprod the INBOX archival task' speed before and after the experiment.
    If the result of the proposal is good (task speed is faster and OpenSearch is not blown up), we could consider adopting the improvement.
@Arsnael
Copy link
Member

Arsnael commented Oct 24, 2023

Hmm... I'm wondering if it's really necessary to go that far. I would think the moment you have potentially a lot of pressure is the first time you run the task. After when a good part of your messages are archived already, the pressure would not be so much?

@quantranhong1999
Copy link
Member Author

Hmm... I'm wondering if it's really necessary to go that far. I would think the moment you have potentially a lot of pressure is the first time you run the task. After when a good part of your messages are archived already, the pressure would not be so much?

I agree. But still a potential improvement IMO, I record the idea otherwise one day I forget it.
Open for discussion ^^. Not priority though for sure.

@quantranhong1999 quantranhong1999 added the enhancement New feature or request label Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants