investigate: long find lsn for timestamp operations #7729

koivunej · 2024-05-13T11:09:21Z

Follow-up from recent incident which has stopped causing end-user problems with #7585. We still don't know why so many tenants have long times for the query. It is not limited to only many timelines case, because single timeline tenants show it as well.

Guesses so far:

branch is created but never receives writes in Lsn area where there is high commit density => the area is difficult to search through every time
- assumption is that multiple branches make this N times harder
- cache should help, but is insufficient when one search takes a long time, then cache has churned before the next similar timeline

koivunej · 2024-05-15T12:17:02Z

#7755 shows that configuration change bring a particularly bad bisection from 90s to 13s.

I think that there are still cases where we end up doing a lot more work than should reasonably be done:

the prod project with 777 branches, assuming they had "backup alike branches" would had searched for the PITR Lsn over the same pages multiple times
- with high slru count this would had been prohibitively long
- "backup alike branches" as in branches where last_record_lsn == ancestor_lsn
- perhaps we should special case the last_record_lsn == ancestor_lsn case -- we currently do not have metrics on how many timelines have never progressed beyond their ancestor_lsn
even if the many timelines were able to find different PITR lsns (from their branch), we could still do duplicate work if we need to reconstruct past ancestor_lsn
- I think this is what we ultimately saw during the bug of image layering only the first partition
- then/there the cost of reconstructing the clog pages at the parent was prohibitive

koivunej added the c/storage/pageserver Component: storage: pageserver label May 13, 2024

koivunej self-assigned this May 13, 2024

VladLazar mentioned this issue May 14, 2024

Use Timeline::get_vectored for doing timestamp lookups in SLRUs #7755

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

investigate: long find lsn for timestamp operations #7729

investigate: long find lsn for timestamp operations #7729

koivunej commented May 13, 2024

koivunej commented May 15, 2024

investigate: long find lsn for timestamp operations #7729

investigate: long find lsn for timestamp operations #7729

Comments

koivunej commented May 13, 2024

koivunej commented May 15, 2024