Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

glromeo · 2021-12-06T15:17:30Z

This PR aims at solve #170 and solve #159.

I implemented the kind of binary search that @bcoe suggested in #159 and I tweaked the range check because the
I noticed that the = in

v8-to-istanbul/lib/source.js

Lines 78 to 80 in 53c1cd8

    
           const lines = this.lines.filter((line, i) => { 
        
             return startCol <= line.endCol && endCol >= line.startCol 
        
           })

seems to be what was causing #170 because tends to overflow on unmapped lines, the range is considered invalid and as a result there are false positives in coverage.

I am keen to investigate and fix also #81 as part of this

glromeo · 2021-12-12T03:29:32Z

#81 was caused mostly by:

v8-to-istanbul/lib/v8-to-istanbul.js

Lines 179 to 181 in 53c1cd8

    
           if (startCol <= line.startCol && endCol >= line.endCol && !line.ignore) { 
        
             line.count = range.count 
        
           }

because the v8 startOffset was 50 while the line startCol was 48 that's why I introduced minCol and maxCol which take into account the gaps caused by whitespaces

The part I am not 100% sure of is the handling of lines that have only whitespaces (like 3 in the issue example had only 2 spaces). I added a check that minCol < maxCol to exclude them from the recording.
From the fixes I did in the snapshots I am reassured I improved the situation but I wonder what to think about one of such lines in the middle of an uncover block: should that be uncovered as well or is it ok that results as covered?

.gitattributes

.editorconfig

lib/source.js

…istanbul#172)

glromeo · 2021-12-16T17:53:23Z

I need to come up with a better solution to extend the ranges covering blank lines...

glromeo · 2021-12-18T00:45:48Z

tap-snapshots/test/v8-to-istanbul.js.test.cjs

@@ -2871,7 +3154,7 @@ Object {
    "37": 0,
    "38": 1,
    "39": 1,
-    "4": 1,
+    "4": 2,


This snapshot has been fixed because the original coverage was wrong. "4" (which corresponds to line 5) same as "5" (which is the first line in the function body) have count 2 because they are inside the range in multi-source-map.js

Awesome work, I think we might just want to call this a breaking change when we land, but then take it without bumping the major of c8, or other deps, this will allow each dependent library to update their tests.

glromeo · 2021-12-18T01:00:28Z

tap-snapshots/test/v8-to-istanbul.js.test.cjs

@@ -279,8 +279,8 @@ Object {
      "line": 35,
      "loc": Object {
        "end": Object {
-          "column": 0,
-          "line": 37,
+          "column": 29,


The start of this branch is left unchanged, the end is slightly tighter.

I think that's a side effect of trimming the v8 range. Overall the range still seems OK to me
...maybe even better not being greedy?

glromeo · 2021-12-18T01:18:03Z

@SimenB @bcoe sorry for the noise...this was a hard nut to crack.

I reopened the PR now that I found a solution to the range mismatch problem that I think is effective.
In a nutshell: I noticed that v8 ranges tended not to match with line startCol and endCol and because of
whitespaces. Most of the times lines were excluded from the range beacause of:

v8-to-istanbul/lib/v8-to-istanbul.js

Lines 179 to 181 in 53c1cd8

    
           if (startCol <= line.startCol && endCol >= line.endCol && !line.ignore) { 
        
             line.count = range.count 
        
           }

so I ended up resorting to compare the ranges after having trimmed the white spaces which guarantees more reliable matches.
I checked out C8 and linked it to my branch to see if something was getting broken in there and to my surprise I found I just fix some bugs in the coverage there.
5 cases that are very similar:

take this one for example:

the expected range 27-28 was actually wrong because it was leaving out the method name.

At this point I would like you to review my approach and if I have your blessing I intend to look into how to optimise it.

bcoe · 2021-12-21T17:06:42Z

@glromeo thank you for this contribution 👏 it might take me a little while to review, just because this library is complex and I need to page it into memory, in the hopes of not breaking anyone.

bcoe · 2021-12-21T17:07:45Z

@glromeo I think the best way to ultimatley test, would be to link against c8 and run against some real world codebases.

glromeo · 2021-12-22T00:27:07Z

Here's an example new vs old using node-fetch:

glromeo · 2021-12-22T00:36:06Z

another example from node-fetch

glromeo · 2021-12-22T00:54:00Z

glromeo · 2021-12-22T14:08:05Z

@bcoe I fixed the shebang snapshot and now all tests are ok (that slipped before because I was using node 16.x)
the original snapshot had 2 branches which doesn't make sense, the fixed has only one
to double check that I ran that file through nyc

and I see nyc doesn't count the 1st pass as a branch hence it says 0...but definitely not 2 branches!

bcoe

Realized I left a couple comments but didn't leave the review.

Thank you for this contribution, it's looking great. I'll need to do some thorough testing with c8 in a few codebases, just to make sure things are looking good.

Have you managed to do any real world testing with this branch?

bcoe · 2021-12-21T17:09:59Z

lib/range.js

+  let upper = lines.length - 1
+
+  let s
+  while (lower <= upper) {


I appreciate that you pulled this into a helper file.

thank you, I actually spent some time asking myself: is it better to do 2 binary searches or not? At the end I preferred to opt for only 1 binary search and then linear scan because it's unlikely that in a large file a range would be too big anyway so I guess that the linear search should perform better

bcoe · 2021-12-21T17:11:08Z

tap-snapshots/test/v8-to-istanbul.js.test.cjs

@@ -2871,7 +3154,7 @@ Object {
    "37": 0,
    "38": 1,
    "39": 1,
-    "4": 1,
+    "4": 2,


Awesome work, I think we might just want to call this a breaking change when we land, but then take it without bumping the major of c8, or other deps, this will allow each dependent library to update their tests.

bcoe · 2021-12-30T03:55:24Z

@glromeo let's release a candidate release of this work which you can pull into cypress-c8, and I can pull into c8. I'm very excited to have had someone dig in and do this cleanup -- I just want to make sure we test thoroughly, as this codebase has been a great source of off by one errors, out of bounds errors, etc., in the past 😆

glromeo · 2021-12-30T23:13:04Z

thank you @bcoe

Have you managed to do any real world testing with this branch?

the side by side comparisons above are taken from the html coverage of node-fetch linking locally my branch of v8-to-istanbul as you suggested.
It's only one project but big enough and it seems to cover the changes.
I am happy to test it with other real projects if you can point me to them... it's a bit tricky to find ones that use c8 and bring some added value.

bcoe · 2021-12-31T04:38:42Z

@glromeo I usually test with the projects below for a major update:

a couple of my TypeScript repos from work: https://github.com/googleapis/nodejs-ai-platform, https://github.com/googleapis/nodejs-compute, https://github.com/googleapis/nodejs-storage.
my pet TypeScript project, release-please: https://github.com/googleapis/release-please
c8 itself.
yargs.

If I'm feeling really ambitious, I'll test on the Node codebase, which uses c8 for coverage.

I have Monday off, so can potentially do someof this testing, if you don't beat me to the punch. I'll also prioritize giving a thorough review of this PR.

bcoe · 2021-12-31T17:22:42Z

@glromeo I did some initial testing in c8, and it seems like this current implementation is a bit too permissive, running your branch against:

https://github.com/bcoe/c8/blob/main/test/fixtures/all/ts-only/main.ts

Indicates that only lines 4 - 5 are uncovered, whereas the old implementation which flags 16-18 as also not being covered is correct (the line return 'wat?' does not get executed in the test case).

bcoe

I think ti might be worth splitting this into two PRs, one that introduces the optimized line gathering -- which we can then benchmark against the old approach (I think this will be a significant benefit for large codebases).

And then I think we might want to dig into the false postives separately. Source Maps are a pain in the neck because they themselves are a best guess of a remapping, so tiny changes to things like line endings can break them -- tldr, we'll need to be careful.

bcoe · 2021-12-31T17:31:16Z

lib/line.js

@@ -4,12 +4,14 @@ module.exports = class CovLine {
    // note that startCol and endCol are absolute positions
    // within a file, not relative to the line.
    this.startCol = startCol
+    this.minCol = startCol + lineStr.length - lineStr.trimStart().length


Trimming lines may actually be the cause of the remapping issues I saw in c8, since we specifically see breakage with tests that exercise SourceMap files.

The source map lookup logic is quite finicky, and trimming before performing it may need to be avoided.

@bcoe the investigation I did suggests me that the tests are broken but that the accuracy of the coverage improves. As I posted here and in the node-fetch comparisons the existing logic seems to often miss the first line of a range. I look forward to your feedback on those examples

glromeo · 2021-12-31T17:58:37Z

ok! I'll split the PR first and then I'll look into the breakages... thank you for pointing them out!

glromeo · 2022-01-01T01:22:50Z

@bcoe

Indicates that only lines 4 - 5 are uncovered, whereas the old implementation which flags 16-18 as also not being covered is correct (the line return 'wat?' does not get executed in the test case).

I made sure that the coverage folder was deleted and I re-run the tests in c8 linked to my branch and I got

is in the html reports that you saw the errors?

$ node --version
v16.13.1

glromeo · 2022-01-01T02:39:12Z

@bcoe please note that this PR now follows #173

…into master

glromeo · 2022-01-03T22:17:06Z

Given the findings in #173 I better close this PR at this point and open a new one later on

glromeo changed the title ~~Fix for generated line range overflowing~~ Fix for generated line range overflowing (issue #170) Dec 6, 2021

glromeo changed the title ~~Fix for generated line range overflowing (issue #170)~~ Optimized offsetToOriginalRelative range calculation Dec 8, 2021

SimenB requested a review from bcoe December 9, 2021 10:27

glromeo commented Dec 12, 2021

View reviewed changes

.gitattributes Show resolved Hide resolved

glromeo commented Dec 12, 2021

View reviewed changes

.editorconfig Show resolved Hide resolved

glromeo commented Dec 12, 2021

View reviewed changes

lib/source.js Show resolved Hide resolved

glromeo added a commit to glromeo/cypress-c8 that referenced this pull request Dec 12, 2021

removed hack to force correct coverage (waiting for istanbuljs/v8-to-…

e977d7a

…istanbul#172)

glromeo marked this pull request as draft December 16, 2021 17:51

glromeo marked this pull request as ready for review December 18, 2021 00:19

glromeo commented Dec 18, 2021

View reviewed changes

glromeo changed the title ~~Optimized offsetToOriginalRelative range calculation~~ Optimized lines gathering and fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces Dec 18, 2021

bcoe reviewed Dec 30, 2021

View reviewed changes

bcoe requested changes Dec 31, 2021

View reviewed changes

fix for issue #159

093374b

fixed shebang fixture incorrectly having 2 branches

a3c3926

glromeo mentioned this pull request Jan 1, 2022

Optimized ranges (fix for issue #150) #173

Merged

fix for issue #81 and issue #170

3c9cae4

glromeo added 3 commits January 1, 2022 02:48

removed trimRange (oops!)

825e320

Merge branch 'optimized-ranges' of github.com:glromeo/v8-to-istanbul …

75f3385

…into master

restored trimRange

870382a

glromeo changed the title ~~Optimized lines gathering and fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces~~ Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces Jan 1, 2022

glromeo closed this Jan 3, 2022

This was referenced Jan 13, 2022

whitespace trimming approach to fix #81 #179

Closed

Fine tune slice range to fix #170 #180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

glromeo commented Dec 6, 2021 •

edited

glromeo commented Dec 12, 2021

glromeo commented Dec 16, 2021

glromeo Dec 18, 2021 •

edited

bcoe Dec 21, 2021

glromeo Dec 18, 2021

glromeo commented Dec 18, 2021

bcoe commented Dec 21, 2021

bcoe commented Dec 21, 2021

glromeo commented Dec 22, 2021

glromeo commented Dec 22, 2021

glromeo commented Dec 22, 2021

glromeo commented Dec 22, 2021 •

edited

bcoe left a comment

bcoe Dec 21, 2021

glromeo Jan 1, 2022

bcoe Dec 21, 2021

bcoe commented Dec 30, 2021

glromeo commented Dec 30, 2021 •

edited

bcoe commented Dec 31, 2021

bcoe commented Dec 31, 2021

bcoe left a comment

bcoe Dec 31, 2021

glromeo Jan 1, 2022

glromeo commented Dec 31, 2021

glromeo commented Jan 1, 2022 •

edited

glromeo commented Jan 1, 2022

glromeo commented Jan 3, 2022

	const lines = this.lines.filter((line, i) => {
	return startCol <= line.endCol && endCol >= line.startCol
	})

Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

Conversation

glromeo commented Dec 6, 2021 • edited

glromeo commented Dec 12, 2021

glromeo commented Dec 16, 2021

glromeo Dec 18, 2021 • edited

Choose a reason for hiding this comment

bcoe Dec 21, 2021

Choose a reason for hiding this comment

glromeo Dec 18, 2021

Choose a reason for hiding this comment

glromeo commented Dec 18, 2021

bcoe commented Dec 21, 2021

bcoe commented Dec 21, 2021

glromeo commented Dec 22, 2021

glromeo commented Dec 22, 2021

glromeo commented Dec 22, 2021

glromeo commented Dec 22, 2021 • edited

bcoe left a comment

Choose a reason for hiding this comment

bcoe Dec 21, 2021

Choose a reason for hiding this comment

glromeo Jan 1, 2022

Choose a reason for hiding this comment

bcoe Dec 21, 2021

Choose a reason for hiding this comment

bcoe commented Dec 30, 2021

glromeo commented Dec 30, 2021 • edited

bcoe commented Dec 31, 2021

bcoe commented Dec 31, 2021

bcoe left a comment

Choose a reason for hiding this comment

bcoe Dec 31, 2021

Choose a reason for hiding this comment

glromeo Jan 1, 2022

Choose a reason for hiding this comment

glromeo commented Dec 31, 2021

glromeo commented Jan 1, 2022 • edited

glromeo commented Jan 1, 2022

glromeo commented Jan 3, 2022

glromeo commented Dec 6, 2021 •

edited

glromeo Dec 18, 2021 •

edited

glromeo commented Dec 22, 2021 •

edited

glromeo commented Dec 30, 2021 •

edited

glromeo commented Jan 1, 2022 •

edited