Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP] BottomBarStatistics: faster computation of stats #3924

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rrahir
Copy link
Collaborator

@rrahir rrahir commented Mar 26, 2024

Description:

[IMP] BottomBarStatistics: Memoize cell filtering
To compute the statistics, we filter out the cells by type for each
function. However, there are only 2 different filters to apply for the
6 different functions.

This revision introduces a memoization of that filter to compute it 3
times less.

[IMP] BottomBarStatistics: Lazify statistic evaluation
Each time we fetch the statistics, we compute all 6 of them even though
we only need one of them to display (We need all of them when the
statistics selection menu is open).

This revision lazifies the computation of those statistics.

[IMP] BottomBarStatistics: limit redundant getter calls
While we iterate over the cells to check their visibility, we keep
checking the visibility of cells over which we already know that their
col (or row) is hidden.

This revision adds a memoization of limit the calls to the different
getters involved by memoizing them.

Overall Benchmark

On a demo sheet with makeLargeDataset of 260k cells

_computeStatisticFnResult:
Firefox: 200ms to 80ms
Chrome: 130ms to 42ms

Task: : 3827324

review checklist

  • feature is organized in plugin, or UI components
  • support of duplicate sheet (deep copy)
  • in model/core: ranges are Range object, and can be adapted (adaptRanges)
  • in model/UI: ranges are strings (to show the user)
  • undo-able commands (uses this.history.update)
  • multiuser-able commands (has inverse commands and transformations where needed)
  • new/updated/removed commands are documented
  • exportable in excel
  • translations (_t("qmsdf %s", abc))
  • unit tested
  • clean commented code
  • track breaking changes
  • doc is rebuild (npm run doc)
  • status is correct in Odoo

@robodoo
Copy link
Collaborator

robodoo commented Mar 26, 2024

@rrahir rrahir force-pushed the master-stat-fast-rar branch 3 times, most recently from ee53d08 to 60b5111 Compare March 26, 2024 09:02
const evaluatedCells = getCells(fn.types.join(","));
if (evaluatedCells.length) {
fnResult = fn.compute(evaluatedCells, locale);
fnResult = lazy(() => fn.compute(evaluatedCells, locale));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could even make the cell iteration and filter part of the lazy computation

    const fnResult: Lazy<number| undefined> = lazy(() => {
        const evaluatedCells = getCells(fn.types.join(","));
        if (evaluatedCells.length) {
          return fn.compute(evaluatedCells, locale);
        }
        return undefined;
      });

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we cannot. We check the presence of fnResult in the final object to know if we should display the statistics button or not. So we'd end up evaluating everything all the time

Comment on lines +110 to +111
const isColHidden = memoize((col: number) => getters.isColHidden(sheetId, col));
const isRowHidden = memoize((row: number) => getters.isRowHidden(sheetId, row));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it really makes a significant difference ?
I believe the benchmark in the commit message is not the correct one. It's suspiciously the same as the previous commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's around 40ms for each improvement, I left out the commas afterwards, I guess I can make it again if you want ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if this getter really cause performance issue (which I'm not sure how since it really does nothing), shouldn't it be memoized inside the plugin rather than here ? That way every call to this getter in other plugin/components is faster

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to say. Don't believe me? then try it for yourself.
The reason I haven't set it everywhere right now is because I don't want to have to invalidate the cache all the time and it's pretty much the only place where it's heavily used.

Copy link
Collaborator Author

@rrahir rrahir Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm a moron and can't make proper commit messages apparently. The gain was around 30° ms. I did another round of benchmark and will update the values accordingly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the 2 first improvements, the time is mostly spent in isColFolder and isRowFolder (of header_grouping.ts).
I think it would be wise to store on this plugin the list of cols and rows that are folded by sheet, so these getters becomes immediate.
image

To compute the statistics, we filter out the cells by type for each
function. However, there are only 2 different filters to apply for the
6 different functions.

This revision introduces a memoization of that filter to compute it 3
times less.

Benchmark
---------

In Firefox with the largeDataSet of 260k cells, averaged on 10 runs:

        cells.filter    _computeStatisticFnResult
---------------------------------------------------
before:  63.8 ms                  186.8 ms
after:   22.4 ms                  138 ms

Task: 3827324
Each time we fetch the statistics, we compute all 6 of them even though
we only need one of them to display (We need all of them when the
statistics selection menu is open).

This revision lazifies the computation of those statistics.

Benchmark
---------

In Firefox with the largeDataSet of 260k cells, averaged on 10 runs:

         _computeStatisticFnResult
-----------------------------------
before:           138  ms
after:            95.8 ms

Task: 3827324
While we iterate over the cells to check their visibility, we keep
checking the visibility of cells over which we already know that their
col (or row) is hidden.

This revision adds a memoization of limit the calls to the different
getters involved by memoizing them as well as skips the visit of cells
within columns that are hidden.

Benchmark
---------

In Firefox with the largeDataSet of 260k cells, averaged over 10 runs:

Skipping entire hidden columns

         _computeStatisticFnResult
-----------------------------------
before:           95.8 ms
after:            79   ms

Skipping hidden columns + memoization

         _computeStatisticFnResult
-----------------------------------
before:           95.8 ms
after:            72.4 ms

Task: 3827324
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants