Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sequence prefetching with in-memory cache #89

Open
reece opened this issue Oct 31, 2020 · 3 comments
Open

Implement sequence prefetching with in-memory cache #89

reece opened this issue Oct 31, 2020 · 3 comments
Labels
keep alive exempt issue from staleness checks

Comments

@reece
Copy link
Member

reece commented Oct 31, 2020

SeqRepo is capable of >1500 queries/second single-threaded with local data. At this rate, sequence fetching is likely to be a small component of overall execution of a typical analysis pipeline.

Optimizing significantly beyond current performance requires loading sequences in memory. However, it's not generally feasible or useful to prefetch all sequences. Current human databases are ~12GB compressed. Prefetching selected sequences on first access could be very beneficial for certain access patterns.

Prefetching might work as follows. The client would be instantiated with a prefetch cache size, which would control the number of sequences in the prefetch cache. The default is 0 (no prefetch).

When a client requests a slice of a sequence, the entire sequence would be read speculatively, anticipating that the next queries might be on the same sequence (e.g., on a single chromosome). Subsequent sequence lookups would be entirely in-memory.

The cache would operate in a typical LRU sense, automatically flushing the sequence least recently accessed if the cache size has reached its target size.

Importantly, prefetching can degrade performance if accesses are not suitably ordered.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Sep 20, 2023
@github-actions
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2023
@reece reece added stale closed Issue was closed automatically due to inactivity and removed stale closed Issue was closed automatically due to inactivity labels Nov 27, 2023
@reece reece reopened this Dec 8, 2023
@github-actions github-actions bot removed the stale Issue is stale and subject to automatic closing label Dec 9, 2023
@jsstevenson jsstevenson added the keep alive exempt issue from staleness checks label Dec 29, 2023
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Mar 29, 2024
@jsstevenson jsstevenson removed the stale Issue is stale and subject to automatic closing label Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

No branches or pull requests

2 participants