Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking TextFinder performance compared to other loop/filter techniques #1

Open
mhawksey opened this issue Sep 22, 2020 · 11 comments

Comments

@mhawksey
Copy link

Big fan of your work and various benchmark reports. I know you've already highlighted the TextFinder class in Spreadsheet services as one of your examples. I wonder if you have considered writing a report on TextFinder performance compared to other javascript techniques for finding values in arrays?

@mhawksey mhawksey changed the title Benchmarking TextFinder performance compared to other loop/filter techniques Benchmarking TextFinder performance compared to other loop/filter techniques Sep 22, 2020
@tanaikech
Copy link
Owner

Thank you for your comment. TextFinder retrieves from Google Spreadsheet. The find value from an array doesn't use Google Spreadsheet. I think that the situation of this comparison is interesting. So I would like to try to measure the cost of TextFinder and the find value from an array.

In this case, I would like to measure the cost with enabling V8 runtime. When I got the result, I would like to report it.

@tanaikech
Copy link
Owner

tanaikech commented Sep 23, 2020

I have a question for correctly understanding about your proposal.

When it measures the process cost of TextFinder, the values on the Spreadsheet are used. So in order to correctly understand about this process, I think that when it measures the find values from an array, it is required to include both costs of the retrieve values from Spreadsheet and the search of values. So in this case, I think that the following scripts are required to be compared. In this sample script, the row number is retrieved. How about this?

For TextFinder

const result = SpreadsheetApp
  .getActiveSpreadsheet()
  .getSheetByName(name)
  .createTextFinder(findText)
  .findAll()
  .map(range => range.getRow());

For find values from an array

const result = SpreadsheetApp
  .getActiveSpreadsheet()
  .getSheetByName(name)
  .getDataRange()
  .getValues()
  .reduce((ar, row, i) => {
    if (row.indexOf(findText) > -1) ar.push(i + 1);
    return ar;
  }, []);

By the way, when the searched rows are retrieved, it has already been found that the process cost of TextFinder is higher than that of the find values from an array, because getValues is used in a loop. (in this case, I think that the cost of this method will be lower.) So, in this sample, the searched row numbers are retrieved.

@mhawksey
Copy link
Author

I agree that the cost of getting sheet values should be factored in. I wonder if it is also worth exploring the the impact of Range.createTextFinder() and Sheet.createTextFinder() - Range is mentioned to be slow in https://stackoverflow.com/q/56517186/1027723

@tanaikech
Copy link
Owner

Thank you for replying. About the cost of Range for TextFinder, I had thought that the process might be different between the search from Sheet and the search from Range in the internal server. So in this case, I will use getSheetByName(name).createTextFinder(findText). Because when the benchmark is measured, both the input and output are required to be the same while the process can be freely selected. When our discussions are summarized, the method for measuring the process cost of TextFinder and the find value from array is as follows.

  • Input value: A sheet name and a search text.
  • Output value: Row numbers which have the searched text.
  • Script: Above scripts.
    • About TextFinder, a search text is searched from a sheet and retrieve the row numbers from returned ranges.
    • About the find value from array, the values are retrieved from the data range of a sheet as 2 dimensional array, and a search text is searched from the array and retrieve the row numbers from the array index.

When you have any questions and proposals, please tell me. I would like to start the measurement when we could confirm that our directions became the same.

@mhawksey
Copy link
Author

This makes sense - other people will be able to extend the experiment to cover other use cases if needed

@tanaikech
Copy link
Owner

Thank you for replying. I would like to start to measure the process cost. When I could obtain the result, I would like to publish it and tell here.

@tanaikech
Copy link
Owner

tanaikech commented Sep 30, 2020

I published "Benchmark: Process Costs for Searching Values in Spreadsheet using Google Apps Script" as a report. Could you please confirm it?
Gists: https://gist.github.com/tanaikech/0a6f03970b471ffa286f1dac0b79359e
MyBlob: https://tanaikech.github.io/2020/09/30/benchmark-process-costs-for-searching-values-in-spreadsheet-using-google-apps-script/

@mhawksey
Copy link
Author

Excellent work and interesting results. I wonder if as well as the code snippets you could share the entire code and example Google Sheet so that other people can replicate the experiment?

@tanaikech
Copy link
Owner

Thank you for replying. I added the sample script for measuring the process cost.

@mhawksey
Copy link
Author

mhawksey commented Oct 2, 2020

Thanks for putting this together. I've posted on Pulse and plan to reference this work in another post 👍

https://pulse.appsscript.info/p/2020/10/benchmark-process-costs-for-searching-values-with-textfinder-in-google-sheets-using-google-apps-script/

@tanaikech
Copy link
Owner

Thank you for replying and posting it. To search a value from Spreadsheet is included in a lot of situations for applications. And I think that when a value is searched from a Spreadsheet using Google Apps Script, TextFinder is easier to use than the query language. So I believe that the post will be useful for a lot of users. Thank you for proposing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants