Optimization: cache semantic lookup in tokenizer, add semantic tokens bench marks #4069

soulomoon · 2024-02-12T10:43:53Z

It is high chance that an identifier might appear multiple times in a file, cache the identifier hs semantic type result in tokenizer saves repeated computation.
A new test case is added to ensure invisible token won't not sabotage the cache lookup
Add bench mark for semantic tokens
fix bench mark running issue.

It gains about 5% performance increasement to the GetSemanticTokens Rule.

wz1000 · 2024-02-15T07:57:48Z

could we have some benchmarks on large files to see if this actually makes a difference?

soulomoon · 2024-02-15T17:22:05Z

WIth

  - name: cabal
    package: Cabal
    version: 3.6.3.0
    modules:
        - src/Distribution/Simple/Configure.hs

cabal bench -j --benchmark-options="profiled-cabal"

version	configuration	name	success	samples	startup	setup	userT	delayedT	1stBuildT	avgPerRespT	totalT	rulesBuilt	rulesChanged	rulesVisited	rulesTotal	ruleEdges	ghcRebuilds	maxResidency	allocatedBytes
upstream	All	semanticTokens	True	50	2.09	0.00	9.37	0.03	3.10	0.13	9.40	5147	5146	5147	5908	31251	1	174MB	13491MB
cache-semantic-lookup	All	semanticTokens	True	50	1.87	0.00	8.36	0.10	2.77	0.11	8.46	5147	5146	5147	5908	31251	1	173MB	13437MB

wz1000 · 2024-02-15T17:31:55Z

Looks like a decent improvement. Could you add this benchmark to the ghcide-bench experiments please?

wz1000 · 2024-02-15T17:33:38Z

the ghcRebuilds number seems to indicate that we aren't editing the file in the benchmark passes. Could you ensure that each benchmark iteration also edits the file?

soulomoon · 2024-02-15T17:37:57Z

the ghcRebuilds number seems to indicate that we aren't editing the file in the benchmark passes. Could you ensure that each benchmark iteration also edits the file?

Sure, I will make some changes

soulomoon · 2024-02-15T17:50:43Z

I need to take some more time to come up a more decent benchmark on this. Convert it to draft now.

soulomoon · 2024-02-16T14:22:58Z

The overal improvement is not really obivious. Slightly better mem usage and speed.

version	configuration	name	success	samples	startup	setup	userT	delayedT	1stBuildT	avgPerRespT	totalT	rulesBuilt	rulesChanged	rulesVisited	rulesTotal	ruleEdges	ghcRebuilds	maxResidency	allocatedBytes
upstream	All	semanticTokens	True	100	2.00	0.00	128.78	1.27	3.41	0.63	130.06	12	10	1364	5914	31271	102	296MB	568821MB
cache-semantic-lookup	All	semanticTokens	True	100	1.92	0.00	126.47	2.11	3.43	0.62	128.60	12	10	1364	5914	31271	102	290MB	562533MB

GetSemanticRules stably gives better results

Another result.

soulomoon · 2024-02-16T14:35:29Z

After update to master, seems it does show decent improvement.

version	configuration	name	success	samples	startup	setup	userT	delayedT	1stBuildT	avgPerRespT	totalT	rulesBuilt	rulesChanged	rulesVisited	rulesTotal	ruleEdges	ghcRebuilds	maxResidency	allocatedBytes
upstream	All	semanticTokens	True	100	1.99	0.00	148.55	2.80	3.75	0.73	151.36	12	10	1364	5914	31271	102	307MB	594060MB
cache-semantic-lookup	All	semanticTokens	True	100	1.93	0.00	134.02	1.15	6.84	0.64	135.18	12	10	1364	5914	31271	102	318MB	562002MB

soulomoon · 2024-02-18T04:57:24Z

Another run, the overall result is similar, but the detailed traces comparison indeed show some decent improvement.

version	configuration	name	success	samples	startup	setup	userT	delayedT	1stBuildT	avgPerRespT	totalT	rulesBuilt	rulesChanged	rulesVisited	rulesTotal	ruleEdges	ghcRebuilds	maxResidency	allocatedBytes
upstream	All	semanticTokens	True	50	1.83	0.00	64.00	1.18	3.50	0.62	65.18	12	10	1364	5914	31271	52	298MB	287351MB
HEAD	All	semanticTokens	True	50	1.91	0.00	64.61	0.70	3.60	0.62	65.32	12	10	1364	5914	31271	52	299MB	285555MB

soulomoon · 2024-02-18T05:05:31Z

cc @wz1000 , I have produced some more accurate result, seems about 5% improvement to GetSemanticTokens Rule.
Should be more if we consider the tokenizer alone.

soulomoon · 2024-02-18T06:10:20Z

shake-bench/src/Development/Benchmark/Rules.hs

@@ -333,7 +333,7 @@ benchRules build MkBenchRules{..} = do
             ++ concat
                [[ "-h"
                  , "-i" <> show i
-                  , "-po" <> outHp
+                  , "-po" <> dropExtension outHp


In macos, the original produce duplicated suffix *.hp.hp making the bench test failed.
dropExtension fix it.

…#4080)

Bumps [pre-commit/action](https://github.com/pre-commit/action) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/pre-commit/action/releases) - [Commits](pre-commit/action@v3.0.0...v3.0.1) --- updated-dependencies: - dependency-name: pre-commit/action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Peyton Jones <me@michaelpj.com>

…okup

michaelpj · 2024-02-28T08:58:24Z

I have produced some more accurate result, seems about 5% improvement to GetSemanticTokens Rule.
Should be more if we consider the tokenizer alone.

FWIW I'm unsure if that's a big enough difference to make this worth it. It might be better just to keep the simpler code (we should definitely add the benchmark, though). Is the improvement bigger on bigger files?

soulomoon · 2024-02-28T10:03:10Z

I have produced some more accurate result, seems about 5% improvement to GetSemanticTokens Rule.

Should be more if we consider the tokenizer alone.

FWIW I'm unsure if that's a big enough difference to make this worth it. It might be better just to keep the simpler code (we should definitely add the benchmark, though). Is the improvement bigger on bigger files?

The bench is running on files with 2.5k and 1k lines of code in cabal package.

agree，not much different since the current computation is relatively simple, perhaps I should turn it into draft and revisit this if/when more features are introduced that make the computation heavy. （btw, the semantic tokens bench and the bench fix are already included his-graph patch)

soulomoon added 5 commits February 12, 2024 18:41

cache semantic lookup

b862c6d

stop propagate failure on visible generated name

f4458a7

add test case

03dac25

Merge branch 'master' into cache-semantic-lookup

069f3ed

cleanup

4614d82

soulomoon marked this pull request as ready for review February 13, 2024 05:50

soulomoon requested a review from wz1000 February 13, 2024 07:24

soulomoon changed the title ~~Optimization: cache semantic lookup~~ Optimization: cache semantic lookup in tokenizer Feb 13, 2024

Merge branch 'master' into cache-semantic-lookup

10fa6ee

configure bench to run semantic tokens

b4bf796

soulomoon requested a review from pepeiborra as a code owner February 15, 2024 17:35

soulomoon marked this pull request as draft February 15, 2024 17:51

soulomoon added 2 commits February 16, 2024 22:23

Merge branch 'master' into cache-semantic-lookup

3953584

try to edit the file and then get result

89a263e

soulomoon marked this pull request as ready for review February 16, 2024 14:37

soulomoon added 3 commits February 16, 2024 22:40

update bench config

d09cc33

fix config

b4a527c

add back test

c8b286a

soulomoon commented Feb 18, 2024

View reviewed changes

refactor plugin: fix regex for extracting import suggestions (haskell…

29bf624

…#4080)

dependabot bot and others added 3 commits February 22, 2024 21:27

Add support for fourmolu 0.15 (haskell#4086)

c476be1

Merge remote-tracking branch 'upstream/master' into cache-semantic-lo…

51f84ce

…okup

soulomoon changed the title ~~Optimization: cache semantic lookup in tokenizer~~ Optimization: cache semantic lookup in tokenizer, add semantic tokens bench marks Feb 22, 2024

soulomoon marked this pull request as draft February 28, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: cache semantic lookup in tokenizer, add semantic tokens bench marks #4069

Optimization: cache semantic lookup in tokenizer, add semantic tokens bench marks #4069

soulomoon commented Feb 12, 2024 •

edited

wz1000 commented Feb 15, 2024

soulomoon commented Feb 15, 2024

wz1000 commented Feb 15, 2024

wz1000 commented Feb 15, 2024

soulomoon commented Feb 15, 2024

soulomoon commented Feb 15, 2024 •

edited

soulomoon commented Feb 16, 2024 •

edited

soulomoon commented Feb 16, 2024 •

edited

soulomoon commented Feb 18, 2024 •

edited

soulomoon commented Feb 18, 2024 •

edited

soulomoon Feb 18, 2024 •

edited

michaelpj commented Feb 28, 2024

soulomoon commented Feb 28, 2024 •

edited

Optimization: cache semantic lookup in tokenizer, add semantic tokens bench marks #4069

Are you sure you want to change the base?

Optimization: cache semantic lookup in tokenizer, add semantic tokens bench marks #4069

Conversation

soulomoon commented Feb 12, 2024 • edited

wz1000 commented Feb 15, 2024

soulomoon commented Feb 15, 2024

wz1000 commented Feb 15, 2024

wz1000 commented Feb 15, 2024

soulomoon commented Feb 15, 2024

soulomoon commented Feb 15, 2024 • edited

soulomoon commented Feb 16, 2024 • edited

soulomoon commented Feb 16, 2024 • edited

soulomoon commented Feb 18, 2024 • edited

soulomoon commented Feb 18, 2024 • edited

soulomoon Feb 18, 2024 • edited

Choose a reason for hiding this comment

michaelpj commented Feb 28, 2024

soulomoon commented Feb 28, 2024 • edited

soulomoon commented Feb 12, 2024 •

edited

soulomoon commented Feb 15, 2024 •

edited

soulomoon commented Feb 16, 2024 •

edited

soulomoon commented Feb 16, 2024 •

edited

soulomoon commented Feb 18, 2024 •

edited

soulomoon commented Feb 18, 2024 •

edited

soulomoon Feb 18, 2024 •

edited

soulomoon commented Feb 28, 2024 •

edited