Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform data indexing and computations on-demand when required #55

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

OssamaRafique
Copy link
Member

While working on on-demand indexing, I noticed that the library was unnecessarily performing complex computations on the data and searching for points in the index, even when the application using the library was not actively listening to events like 'pointHovered', 'pointClicked', or 'selectionEnd'. This led to wasted resources and decreased performance.

In this PR, I have optimized the library by ensuring that data indexing and computation are performed only when required by the application. The indexing process now begins the first time a function that depends on indexing is called, and the indexed data is then reused for subsequent calls. This approach eliminates unnecessary computation and indexing, resulting in improved efficiency and performance.

@SamGRosen
Copy link
Member

Good catch. Is it possible to add a test which times the initial indexing and then subsequent calls to ensure there is a speed boost? This can be a good way to keep track of performance over time.

@OssamaRafique
Copy link
Member Author

Good catch. Is it possible to add a test which times the initial indexing and then subsequent calls to ensure there is a speed boost? This can be a good way to keep track of performance over time.

@SamGRosen Thanks for the suggestion! Indeed, adding a test to measure the initial indexing time and subsequent calls would be a great way to track performance over time. However, our current testing setup uses Cypress, which is primarily designed for end-to-end testing of web applications and may not be the most suitable choice for performance testing.

To properly implement performance tests, we'll need to add support for another testing library, such as Jest, which is more appropriate for this purpose. If you're okay with that, I can work on integrating Jest into our testing setup and then create the performance tests to ensure that the optimization leads to a speed boost and helps maintain performance over time.

@SamGRosen
Copy link
Member

Hmm it may not be worth the effort at the moment. I was always wary of using jest as it didn't seem possible to do good testing with the offscreen canvas as jest-canvas-mock is only a mock, and jest-electron is unsupported. It looks like there is some way to do performance testing with Cypress, but this would probably be best left in the future when speed becomes a priority with WebAssembly.

@@ -188,7 +188,8 @@ describe("Box selection", () => {
);

cy.wrap(dataProcessor)
.should("have.property", "index")
.should("have.property", "specificationHelper")
.then(() => dataProcessor.indexDataIfNotAlreadyIndexed())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hacky way I've done testing of cached operations is to do something like this:

dataProcessor.expensiveOperationThatIsCached();
for(let i = 0; i < 1000; i++) { // If it's not cached, this will take very long
  dataProcessor.expensiveOperationThatIsCached(); // Should return immediately. 
}

@jkanche
Copy link
Member

jkanche commented May 5, 2023

Hmm it may not be worth the effort at the moment. I was always wary of using jest as it didn't seem possible to do good testing with the offscreen canvas as jest-canvas-mock is only a mock, and jest-electron is unsupported. It looks like there is some way to do performance testing with Cypress, but this would probably be best left in the future when speed becomes a priority with WebAssembly.

I wonder if playwright is a good alternative, any of you use it? I've used puppeteer in Kana (perf branch) to calculate time and memory usage across datasets for our paper, not ideal but it works.

@jkanche jkanche linked an issue May 5, 2023 that may be closed by this pull request
@jkanche
Copy link
Member

jkanche commented May 10, 2023

@OssamaRafique is this ready? can you merge the changes from master with this branch?

@OssamaRafique
Copy link
Member Author

@jkanche yes the functionality is ready. But I need to add performance tests. I was focusing on Issue #59. Will write the performance test for this after I'm done with that.

@OssamaRafique OssamaRafique marked this pull request as draft May 11, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

only index data if selection mode is enabled
3 participants