-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3/5] [Run Timeline] Add hourly data cache #21934
Conversation
js_modules/dagster-ui/packages/ui-core/src/runs/useRunsForTimeline.tsx
Outdated
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/runs/HourlyDataCache.tsx
Outdated
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/runs/__tests__/HourlyDataCache.test.tsx
Outdated
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/runs/fetchPaginatedBucketData.ts
Outdated
Show resolved
Hide resolved
Going to be adding more tests and changing strategy a bit here. Will update PR soon |
Going to add more tests + change strategy a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to your queue for ^ planned changes
} | ||
}); | ||
Object.entries(buckets) | ||
.sort((bucketA, bucketB) => COMMON_COLLATOR.compare(bucketA[0], bucketB[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this stops the run timeline rows from shuffling around
@@ -14,7 +14,7 @@ import {WorkspaceContext} from '../workspace/WorkspaceContext'; | |||
import {buildRepoAddress} from '../workspace/buildRepoAddress'; | |||
const LOOKAHEAD_HOURS = 1; | |||
const ONE_HOUR = 60 * 60 * 1000; | |||
const POLL_INTERVAL = 60 * 1000; | |||
const POLL_INTERVAL = 5 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowered the poll interval since we only fetch for the missing ranges.
Increased the poll interview for FutureTicks / Ongoing runs below.
Buildkite failures unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks solid, probably still worth @bengotow taking a peak since my typescript isn't the strongest
js_modules/dagster-ui/packages/ui-core/src/runs/HourlyDataCache/HourlyDataCache.tsx
Outdated
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/runs/fetchPaginatedBucketData.ts
Outdated
Show resolved
Hide resolved
## Summary & Motivation Add an hourly data cache to the Run Timeline. We only cache completed run data and we only request data for time ranges that we don't already have data for. - how do we handle settling ongoing runs in to the completed bucket? - - when an ongoing run goes into completed it should have an updatedTimestamp that belongs to a range we havent fetched yet Updated the completed run bucketing query logic to fetch updatedAfter -> updatedBefore timestamps to avoid the overfetching that the current updatedAfter -> createdBefore buckets make. ![image](https://github.com/dagster-io/dagster/assets/2286579/af0b2c02-f189-4332-95a2-5cacba95a6d3) updatedAfter -> updatedBefore buckets only capture range 3 and 4 boxes though: ![image](https://github.com/dagster-io/dagster/assets/2286579/c566547a-43b9-4d83-bd07-5e5b9e1285be) To capture range 2 and range 1 we need runs that were createdBefore the right boundary and updated after the right boundary. Instead of querying for these runs specifically we rely on the fact that we have already queried for all the future buckets since we always paginate backwards. So we subscribe to the cache for that data and then filter for runs that started before the right boundary of the time range ## How I Tested These Changes Jest tests. Left the page open for half an hour and saw in progress runs transition to completed. Switched between 1hr / 6hr / 12hr / 24hr views and saw we only request time ranges that are missing.
Summary & Motivation
Add an hourly data cache to the Run Timeline.
We only cache completed run data and we only request data for time ranges that we don't already have data for.
Updated the completed run bucketing query logic to fetch updatedAfter -> updatedBefore timestamps to avoid the overfetching that the current updatedAfter -> createdBefore buckets make.
updatedAfter -> updatedBefore buckets only capture range 3 and 4 boxes though:
To capture range 2 and range 1 we need runs that were createdBefore the right boundary and updated
after the right boundary. Instead of querying for these runs specifically we rely on the fact that we have already queried for all the future buckets since we always paginate backwards.
So we subscribe to the cache for that data and then filter for runs that started before the right boundary of the time range
How I Tested These Changes
Jest tests.
Left the page open for half an hour and saw in progress runs transition to completed.
Switched between 1hr / 6hr / 12hr / 24hr views and saw we only request time ranges that are missing.