Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Performance issue with large site. #193

Open
1 task done
ekil1100 opened this issue Oct 12, 2022 · 8 comments
Open
1 task done

[BUG] Performance issue with large site. #193

ekil1100 opened this issue Oct 12, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@ekil1100
Copy link

Describe the bug
There is a performance issue with a large site. For example, https://spec.filecoin.io/. Enable reading mode will cause the site to hang and crash.

Reproduce on a website

  1. Visit this link: https://spec.filecoin.io/
  2. Enable reading mode.
  • check or mark with an x to indicate that the link is publicly accessible and also requires no account to be created.

Expected behavior
Should render immediately.

Desktop (please complete the following information):

  • OS: macOS Monterey 12.6
  • Browser Brave
  • Version v1.44.108
  • Extension version 1.0.9.1010
@ekil1100 ekil1100 added the bug Something isn't working label Oct 12, 2022
@ansh
Copy link
Owner

ansh commented Oct 12, 2022

This is a very interested problem with no easy fix. There are a few things we have inconclusively discussed so we are open to suggestions.

The things we have thought about:

  • Writing the logic to actually parse the DOM using WASM
  • Take a virtualized approach where we figure out which parts of the DOM are currently in view and then only bionify those parts

Neither solution seems straightforward to implement. They will both be time consuming.

Thoughts? @ekil1100 @asieduernest12

@ekil1100
Copy link
Author

This is a very interested problem with no easy fix. There are a few things we have inconclusively discussed so we are open to suggestions.

The things we have thought about:

  • Writing the logic to actually parse the DOM using WASM

  • Take a virtualized approach where we figure out which parts of the DOM are currently in view and then only bionify those parts

Neither solution seems straightforward to implement. They will both be time consuming.

Thoughts? @ekil1100 @asieduernest12

I like the WASM approach. It should bring better UX.

@asieduernest12
Copy link
Collaborator

I don't know about the wasm simply because we are not parsing the entire page at one go. We walk through the dom node by mode in kind of a depth first approach to find text nodes then activate ReadAssist(bionic reading) for the given text.

If we Can make wasm part of the entire processing from finding nodes that have text and processing them then it might work. I've not dealt with wasm so I have to look into it.

The other approach @ansh mentioned is possible with intersection observer which is the cousin of the current mutation observer we are using to update new text nodes for dynamic websites.

But I have a sense it might not work well. Because we still have to find all nodes and text nodes and attach the intersection observer to it.

Honestly it is probably not worth the effort but I don't know much.
Alternatively we can take inspiration from the gaming realm and implement a loader screen that informs users thay processing is ongoing.

Finally I checked out the link with the trouble and used document.querSelectorAll("* *"). length
To count the nodes and I got back about 700k.
It it took a out 15secods for me to see the text bionified after clicking it.
Roughly means it took about 0.02 milli seconds per node.

An alternate approach might using web workers to implement a parallel processing mechanism while also creating a map of parses entries do that they are not recomputed but just retrieved from the map if a word string has been processed before.

The map approach is probably the easiest thing we can try snd should take just about 5 to 10 mins to implement

I wrote something about it on another project. I'll find it and link it here later if someone wants to give it a shot. Otherwise I could take a look at the map approach over the weekend.

The web worker approach looks promising but I have to consult some documentation. Cos I used it to build some dynamic programing problem awhile ago.

OH boy so my thoughts are running on.
Using the map should work fine. The only problem would be space complexity.
I have a fear we might max out someone's memory due to using the map but I might be over thinking it giving that there are a finite set of words in any text and building a frequency table proves the unique words will be a relatively small size. Give or take a few thousand unique words. If that's correct then the map sure looks like the best approach yet.

The mutation observer approach on the other hand could be very expensive.

Back to @ansh approach about watching the dom. I don't know if there is a Css selector that can return nodes in the viewing area outside the clipped window. But if there is then we can use something like a scroll over ser to watch the text for scroll changes then process new viewable nodes that have not been parsed already

@ansh
Copy link
Owner

ansh commented Oct 13, 2022

@asieduernest12 PartyTown (https://partytown.builder.io) provides a very simple API to run code on a different thread using web workers. It should be super fast to implement.

As for my virtualization idea, I am not sure where to begin with implementing that. I would look into the React Native Web documentation since they have virtualized list components.

@asieduernest12
Copy link
Collaborator

@ansh i totally forgot party town is a thing. I've seen videos on it but have not used it yet.

One thing to remember is any changes to the parsing logic affect the bookmarklet.
I don't have any commitments to keeping the bookmarklet updated. It's more of if I change the parser and break it then I have a little guilt to fix the bookmarklet to keep it functional for new users that may want it.
Cos the bookmarklet is does not auto itself or check for updates.

Party town sure could do the trick when attempted .

@asieduernest12
Copy link
Collaborator

@ansh https://youtu.be/eP6Mti85HeQ
The conclusion in this video is not inspiring. They are mentioning that it could impact performance.

Technically if we get 100k workers to run in ever 3 seconds per workers. It would be fast since they would all theoretically start at the same time and finish 3 seconds later.
Which would make our 15 second wait on @ekil1100 websitw just 3 seconds.

I've not thought this through though just typing out loud. Lol

@asieduernest12
Copy link
Collaborator

@ansh @ekil1100 i tried out the cache map approach i suggested but it did not yield any positive result from simple benchmarking. Sorry I did not save the figures also but you can test using the code in this branch

The result for using the cache were all over the place and sometimes performed worse than running the parser without any map/cache implementation.

the filecoin site has about 100k+ words and roughly takes 4k to 5k milliseconds to parse then it takes about another 4 to 6 seconds for the browser render the css updates.

@nerac
Copy link

nerac commented May 5, 2023

One alternative solution, while not the same, can be with a feature to select the text you want to apply the effect. Right now is doing it to the whole page but most probably the user has a fixation to an specific part of it. This will not solve the issue but give an alternative way to handle the situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants