Bug Fix: Stop making duplicate time series requests #6529

rileyajones · 2023-08-04T17:36:13Z

Motivation for features / changes

Whenever a card appears on the time series dashboard we make a request to the fetch data for the card for each experiment being viewed. Because not all cards contain data from all experiments being viewed this sometimes results in unnecessary requests being dispatched.

bmd3k

There is a lot about this PR that is confusing. I'm worried you've introduced a bunch of new subtle bugs.

This is a critical piece of code so please take a deep look at what you've written and rigourously test it (manually and with unit tests).

bmd3k · 2023-08-17T13:28:07Z

tensorboard/webapp/metrics/effects/index.ts

+      combineLatestWith(this.store.select(selectors.getRunIdToExperimentId)),
+      map(([tagMetadata, runToEid]) => {
+        const imageTagToRuns = Object.fromEntries(
+          Object.entries(tagMetadata.images.tagRunSampledInfo).map(


Can we handle plugins and sampled vs non-sampled more generically?

Ideally the code is unaware of the set of plugin types (it doesn't know about images, scalars, or histograms). Ideally the code is unaware of which plugin types are sampled and which are not.

There is isSampledPlugin function to help with this, too.

Alright, I thought the additional loop that approach required harmed readability a bit but I've gone ahead and refactored to use it.

bmd3k · 2023-08-17T13:59:01Z

tensorboard/webapp/metrics/effects/index.ts

@@ -68,6 +69,12 @@ const getCardFetchInfo = createSelector(

 const initAction = createAction('[Metrics Effects] Init');

+function parseRunIdFromSampledRunInfoName(eidRun: string): string {
+  if (!eidRun) return '';
+  const [, ...runIdChunks] = eidRun.split('/');


I don't understand what you are trying to parse here. A comment would be helpful.

Is it this part highlighted in red:

Is this the same format as the run names for tagMetadata.scalars.tagToRuns and tagMetadata.histograms.tagToRuns? If so, why handle it differently for this case?

I've found a way to avoid doing this parsing but the structure still needs to be different.
I'll add a block comment explaining this.

The structure of SampledTagMetadata is quite different from non sampled

The NonSampledPlugins map from run to tag while the SampledPlugin(s) map from tag to run

Sampled

Non Sampled

Rough Sketch

Here is a rough sketch with only the relevant parts

{ tagMetadata: { scalars: { runTagInfo: { runId: ['tag1', 'tag2',] }, }, images: { tagRunSampledInfo: { tag: { runId: {maxSamplesPerStep: number} } } }, } }

bmd3k · 2023-08-17T14:19:36Z

tensorboard/webapp/metrics/effects/index.ts

    // Fetch and handle responses.
-    return of(requests).pipe(
+    return this.tagToEid$.pipe(


Rather than piping this.tagToEid$ can we just get the latest value? I'm a little worried about subtle bugs when tagToEid$ changes for whatever unpredicatable reason and the pipe kicks off a new set of requests.

I've added a take(1) I could use a subject instead or maybe a subscription? Let me know if you'd prefer something else.

bmd3k · 2023-08-17T14:20:24Z

tensorboard/webapp/metrics/effects/index.ts

+          }
+          return partialRequest;
+        });
+        const uniqueRequests = new Set(


Is there an actual problem you are trying to solve here? I don't see a test for this and I didn't see anything about it in the PR description.

I am attempting to address the TODO left by psybuzz

Does "if 2 cards require the same data" happen in practice? If it does, is it a source of problems? If not, especially given that this code is critical, do we need to be making unnecessary changes? Also, it's not clear to me that you wrote a test for this particular change?

There was an existing test which verified this was doing the wrong thing which I updated. See the comment that I removed from line 375 of metrics_effects_test

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

bmd3k · 2023-08-17T14:29:12Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+      histograms: {
+        tagDescriptions: {} as any,
+        tagToRuns: {
+          tagA: ['run1'],


Is it valid for there to be duplicate tags across scalars/histograms? Doesn't seem to make sense to me.

Is there anything that prohibits this? Tags can appear in multiple experiments and they could have different data being logged.

Ah ya, fair enough. That makes sense.

bmd3k · 2023-08-17T14:31:58Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+        actions$.next(coreActions.manualReload());
+
+        expect(effectFetchTimeSeriesSpy).toHaveBeenCalledTimes(2);
+        expect(effectFetchTimeSeriesSpy).toHaveBeenCalledWith({


It's extremely hard to reason why the test concludes that these should be the requests sent. Some of the key test data (like in overrideTagMetadata and overrideRunToEid) are setup far from here. Maybe it would be helpful to leave a comment about all the requests that could have been made and identify why certain requests were filtered out.

bmd3k · 2023-08-17T14:33:16Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+
+        expect(effectFetchTimeSeriesSpy).toHaveBeenCalledTimes(2);
+        expect(effectFetchTimeSeriesSpy).toHaveBeenCalledWith({
+          plugin: 'scalars',


Should we also be fetching plugin: 'histograms', tag: 'tagA', experimentIds: ['exp1']?

bmd3k · 2023-08-17T14:35:56Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+        tagDescriptions: {} as any,
+        tagToRuns: {
+          tagA: ['run1'],
+          tagB: ['run2', 'run3'],


There are no references to 'run2' thru 'run6' or 'tagC' and 'tagD' or 'defaultExperimentId' anywhere else in this test as far as I can tell.

Maybe just mock the minimum amount of data you need for existing tests to pass and override at a more detailed level only for your new tests?

I added some additional data to the state to ensure it did not result in additional requests.

Could you add some comments acknowledging which data is unnecessary and why you include it?

After your last comment I added some additional tests and ensured that all of the data is being used. In particular the test does not send requests to experiments lacking a cards tag references every tag.

bmd3k · 2023-08-17T14:40:29Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+        });
+
+        expect(effectFetchTimeSeriesSpy).toHaveBeenCalledWith({
+          plugin: 'scalars',


Given the complexity of the logic you've written it would be good to see more rigourous testing. A single test case doesn't really seem to cut it.

A couple that pop up in my head:

A test case where some image requests make it through the new filter.

A test case where some histogram requests make it through the new filter.

rileyajones

I'll add some additional tests to this but wanted to leave some preliminary comments.

rileyajones · 2023-08-17T18:26:32Z

tensorboard/webapp/metrics/effects/index.ts

+      combineLatestWith(this.store.select(selectors.getRunIdToExperimentId)),
+      map(([tagMetadata, runToEid]) => {
+        const imageTagToRuns = Object.fromEntries(
+          Object.entries(tagMetadata.images.tagRunSampledInfo).map(


Alright, I thought the additional loop that approach required harmed readability a bit but I've gone ahead and refactored to use it.

rileyajones · 2023-08-17T20:30:55Z

tensorboard/webapp/metrics/effects/index.ts

@@ -68,6 +69,12 @@ const getCardFetchInfo = createSelector(

 const initAction = createAction('[Metrics Effects] Init');

+function parseRunIdFromSampledRunInfoName(eidRun: string): string {
+  if (!eidRun) return '';
+  const [, ...runIdChunks] = eidRun.split('/');


I've found a way to avoid doing this parsing but the structure still needs to be different.
I'll add a block comment explaining this.

The structure of SampledTagMetadata is quite different from non sampled

The NonSampledPlugins map from run to tag while the SampledPlugin(s) map from tag to run

Sampled

Non Sampled

Rough Sketch

Here is a rough sketch with only the relevant parts

{ tagMetadata: { scalars: { runTagInfo: { runId: ['tag1', 'tag2',] }, }, images: { tagRunSampledInfo: { tag: { runId: {maxSamplesPerStep: number} } } }, } }

rileyajones · 2023-08-17T21:19:28Z

tensorboard/webapp/metrics/effects/index.ts

    // Fetch and handle responses.
-    return of(requests).pipe(
+    return this.tagToEid$.pipe(


I've added a take(1) I could use a subject instead or maybe a subscription? Let me know if you'd prefer something else.

rileyajones · 2023-08-17T21:20:32Z

tensorboard/webapp/metrics/effects/index.ts

+          }
+          return partialRequest;
+        });
+        const uniqueRequests = new Set(


I am attempting to address the TODO left by psybuzz

rileyajones · 2023-08-17T21:41:03Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+      histograms: {
+        tagDescriptions: {} as any,
+        tagToRuns: {
+          tagA: ['run1'],


Is there anything that prohibits this? Tags can appear in multiple experiments and they could have different data being logged.

rileyajones · 2023-08-17T21:42:52Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+        tagDescriptions: {} as any,
+        tagToRuns: {
+          tagA: ['run1'],
+          tagB: ['run2', 'run3'],


I added some additional data to the state to ensure it did not result in additional requests.

… by run not eid and should be handled differently

…ograms and scalars should not be entangled

bmd3k · 2023-08-24T13:55:51Z

tensorboard/webapp/metrics/effects/index.ts

+  runToEid: Record<string, string>
+): Record<string, Set<string>> {
+  const tagToEid: Record<string, Set<string>> = {};
+  function mapTagsToEid(tagToRun: Record<string, readonly string[]>) {


Nit: I think it would be clearer if you just inlined the contents of this function in the for loop (at L95). You would possibly even save some lines of code. You only use it once, after all.

bmd3k · 2023-08-24T13:57:34Z

tensorboard/webapp/metrics/effects/index.ts

+          }
+          return partialRequest;
+        });
+        const uniqueRequests = new Set(


Does "if 2 cards require the same data" happen in practice? If it does, is it a source of problems? If not, especially given that this code is critical, do we need to be making unnecessary changes? Also, it's not clear to me that you wrote a test for this particular change?

bmd3k · 2023-08-24T14:00:08Z

tensorboard/webapp/metrics/effects/index.ts

+   *
+   * The computation is done by translating Plugin -> Tag -> Run -> ExpId
+   *
+   * Sampled plugins are ignored because they are associated with runs, not experiments.


Sampled plugins are not the only things ignored here. Really it's any single-run plugins that are ignored. Would be good to fix that in the documentation.

I think its also worth explaining the real motivation for the change here - otherwise it is hard to understand why we would bother doing this for scalars and why we wouldn't do this for the others:

We want to eliminate unnecessary requests for experiment+tag combinations where the experiment does not actually contain the tag. In case of single-run plugins we assume that every given request for expeirment+run+tag is already valid, since they originate from cards for that experiment+run+tag combination.

I've updated this comment and added a detail description of the problem and how observable is being used to solve it.

bmd3k · 2023-08-24T14:06:05Z

tensorboard/webapp/metrics/effects/index.ts

    // Fetch and handle responses.
-    return of(requests).pipe(
+    return this.multiRunTagsToEid$.pipe(
+      take(1),


Could we use withLatestFrom instead? The caller of this function uses withLatestFrom so that might be a natural place to tie in this observable?

Yeah, that works.

bmd3k · 2023-08-24T14:13:49Z

tensorboard/webapp/metrics/effects/index.ts

+                plugin,
+                tag,
+                runId,
+                sample,


This assumes that sample only exists for single run plugins. Is that guaranteed by the data model? - single run vs sampled are theoretically orthoganal considerations. (I realize in practice that there are no multi-run, sampled plugins but the old code handles this case fine so I assume that is intentional).

I've added sample to the multi run plugin request. It shouldn't ever come up, but the typing does allow for it so it's close to free to include.

bmd3k · 2023-08-24T14:21:57Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+        tagDescriptions: {} as any,
+        tagToRuns: {
+          tagA: ['run1'],
+          tagB: ['run2', 'run3'],


Could you add some comments acknowledging which data is unnecessary and why you include it?

bmd3k · 2023-08-24T14:22:29Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+      histograms: {
+        tagDescriptions: {} as any,
+        tagToRuns: {
+          tagA: ['run1'],


Ah ya, fair enough. That makes sense.

bmd3k · 2023-08-24T14:26:54Z

tensorboard/webapp/metrics/effects/metrics_effects_test.ts

+        ).toEqual({});
+      });
+
+      it('maps scalar data', () => {


Would it be worth adding one additional test that includes tagMetadata for all of scalars, histograms, and images?

I think it's a little redundant but I've added one to be safe.

stop making duplicate time series requests

cc1bbaf

rileyajones force-pushed the smart-timeseries-fetch branch from 5e18132 to cc1bbaf Compare August 4, 2023 18:16

rileyajones requested a review from bmd3k August 7, 2023 16:40

rileyajones marked this pull request as ready for review August 7, 2023 16:40

bmd3k reviewed Aug 17, 2023

View reviewed changes

rileyajones added 4 commits August 17, 2023 17:33

Merge branch 'master' into smart-timeseries-fetch

f79b718

use plugintype rather than referencing plugins directly

94b102a

refactor tagToEid + use plugin type

5814618

minor test updates

20bf2f0

rileyajones commented Aug 17, 2023

View reviewed changes

rileyajones added 3 commits August 18, 2023 19:27

finished writing tests, discovered a bug - nonsampled data is fetched…

4d264a5

… by run not eid and should be handled differently

fix the issue with sampled plugins. This is still subtly wrong - hist…

3eea398

…ograms and scalars should not be entangled

only remap multi run plugins

0d741af

rileyajones requested a review from bmd3k August 21, 2023 17:36

bmd3k reviewed Aug 24, 2023

View reviewed changes

rileyajones added 3 commits August 24, 2023 18:44

inline helper function

02ee0fd

add a detailed comment and change to using withLatestFrom

bcd88ae

add additional test

11e7729

bmd3k self-requested a review September 20, 2023 21:14

Bug Fix: Stop making duplicate time series requests #6529

Are you sure you want to change the base?

Bug Fix: Stop making duplicate time series requests #6529

Conversation

rileyajones commented Aug 4, 2023

Motivation for features / changes

bmd3k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sampled

Non Sampled

Rough Sketch

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rileyajones left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sampled

Non Sampled

Rough Sketch

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment