You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up from recent incident which has stopped causing end-user problems with #7585. We still don't know why so many tenants have long times for the query. It is not limited to only many timelines case, because single timeline tenants show it as well.
Guesses so far:
branch is created but never receives writes in Lsn area where there is high commit density => the area is difficult to search through every time
assumption is that multiple branches make this N times harder
cache should help, but is insufficient when one search takes a long time, then cache has churned before the next similar timeline
The text was updated successfully, but these errors were encountered:
#7755 shows that configuration change bring a particularly bad bisection from 90s to 13s.
I think that there are still cases where we end up doing a lot more work than should reasonably be done:
the prod project with 777 branches, assuming they had "backup alike branches" would had searched for the PITR Lsn over the same pages multiple times
with high slru count this would had been prohibitively long
"backup alike branches" as in branches where last_record_lsn == ancestor_lsn
perhaps we should special case the last_record_lsn == ancestor_lsn case -- we currently do not have metrics on how many timelines have never progressed beyond their ancestor_lsn
even if the many timelines were able to find different PITR lsns (from their branch), we could still do duplicate work if we need to reconstruct past ancestor_lsn
I think this is what we ultimately saw during the bug of image layering only the first partition
then/there the cost of reconstructing the clog pages at the parent was prohibitive
Follow-up from recent incident which has stopped causing end-user problems with #7585. We still don't know why so many tenants have long times for the query. It is not limited to only many timelines case, because single timeline tenants show it as well.
Guesses so far:
The text was updated successfully, but these errors were encountered: