Perform data indexing and computations on-demand when required #55

OssamaRafique · 2023-05-04T00:52:04Z

While working on on-demand indexing, I noticed that the library was unnecessarily performing complex computations on the data and searching for points in the index, even when the application using the library was not actively listening to events like 'pointHovered', 'pointClicked', or 'selectionEnd'. This led to wasted resources and decreased performance.

In this PR, I have optimized the library by ensuring that data indexing and computation are performed only when required by the application. The indexing process now begins the first time a function that depends on indexing is called, and the indexed data is then reused for subsequent calls. This approach eliminates unnecessary computation and indexing, resulting in improved efficiency and performance.

SamGRosen · 2023-05-04T16:35:56Z

Good catch. Is it possible to add a test which times the initial indexing and then subsequent calls to ensure there is a speed boost? This can be a good way to keep track of performance over time.

OssamaRafique · 2023-05-04T17:02:55Z

Good catch. Is it possible to add a test which times the initial indexing and then subsequent calls to ensure there is a speed boost? This can be a good way to keep track of performance over time.

@SamGRosen Thanks for the suggestion! Indeed, adding a test to measure the initial indexing time and subsequent calls would be a great way to track performance over time. However, our current testing setup uses Cypress, which is primarily designed for end-to-end testing of web applications and may not be the most suitable choice for performance testing.

To properly implement performance tests, we'll need to add support for another testing library, such as Jest, which is more appropriate for this purpose. If you're okay with that, I can work on integrating Jest into our testing setup and then create the performance tests to ensure that the optimization leads to a speed boost and helps maintain performance over time.

SamGRosen · 2023-05-04T18:20:56Z

Hmm it may not be worth the effort at the moment. I was always wary of using jest as it didn't seem possible to do good testing with the offscreen canvas as jest-canvas-mock is only a mock, and jest-electron is unsupported. It looks like there is some way to do performance testing with Cypress, but this would probably be best left in the future when speed becomes a priority with WebAssembly.

SamGRosen · 2023-05-04T18:24:40Z

cypress/integration/data-processor.spec.js

@@ -188,7 +188,8 @@ describe("Box selection", () => {
    );

    cy.wrap(dataProcessor)
-      .should("have.property", "index")
+      .should("have.property", "specificationHelper")
+      .then(() => dataProcessor.indexDataIfNotAlreadyIndexed())


One hacky way I've done testing of cached operations is to do something like this:

dataProcessor.expensiveOperationThatIsCached(); for(let i = 0; i < 1000; i++) { // If it's not cached, this will take very long dataProcessor.expensiveOperationThatIsCached(); // Should return immediately. }

jkanche · 2023-05-05T16:58:37Z

Hmm it may not be worth the effort at the moment. I was always wary of using jest as it didn't seem possible to do good testing with the offscreen canvas as jest-canvas-mock is only a mock, and jest-electron is unsupported. It looks like there is some way to do performance testing with Cypress, but this would probably be best left in the future when speed becomes a priority with WebAssembly.

I wonder if playwright is a good alternative, any of you use it? I've used puppeteer in Kana (perf branch) to calculate time and memory usage across datasets for our paper, not ideal but it works.

jkanche · 2023-05-10T14:53:40Z

@OssamaRafique is this ready? can you merge the changes from master with this branch?

OssamaRafique · 2023-05-10T15:16:44Z

@jkanche yes the functionality is ready. But I need to add performance tests. I was focusing on Issue #59. Will write the performance test for this after I'm done with that.

OssamaRafique added 2 commits May 3, 2023 17:38

Perform data indexing and computations on-demand when required

a7a51cf

Typo Fixed

6b9b039

OssamaRafique mentioned this pull request May 4, 2023

only index data if selection mode is enabled #51

Open

SamGRosen reviewed May 4, 2023

View reviewed changes

jkanche linked an issue May 5, 2023 that may be closed by this pull request

only index data if selection mode is enabled #51

Open

Merge branch 'main' into only-index-when-required

9715483

OssamaRafique marked this pull request as draft May 11, 2023 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform data indexing and computations on-demand when required #55

Perform data indexing and computations on-demand when required #55

OssamaRafique commented May 4, 2023

SamGRosen commented May 4, 2023

OssamaRafique commented May 4, 2023

SamGRosen commented May 4, 2023

SamGRosen May 4, 2023

jkanche commented May 5, 2023

jkanche commented May 10, 2023

OssamaRafique commented May 10, 2023

Perform data indexing and computations on-demand when required #55

Are you sure you want to change the base?

Perform data indexing and computations on-demand when required #55

Conversation

OssamaRafique commented May 4, 2023

SamGRosen commented May 4, 2023

OssamaRafique commented May 4, 2023

SamGRosen commented May 4, 2023

SamGRosen May 4, 2023

Choose a reason for hiding this comment

jkanche commented May 5, 2023

jkanche commented May 10, 2023

OssamaRafique commented May 10, 2023