Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces #172

Closed
wants to merge 6 commits into from

Conversation

glromeo
Copy link
Contributor

@glromeo glromeo commented Dec 6, 2021

This PR aims at solve #170 and solve #159.

I implemented the kind of binary search that @bcoe suggested in #159 and I tweaked the range check because the
I noticed that the = in

const lines = this.lines.filter((line, i) => {
return startCol <= line.endCol && endCol >= line.startCol
})

seems to be what was causing #170 because tends to overflow on unmapped lines, the range is considered invalid and as a result there are false positives in coverage.

I am keen to investigate and fix also #81 as part of this

@glromeo glromeo changed the title Fix for generated line range overflowing Fix for generated line range overflowing (issue #170) Dec 6, 2021
@glromeo glromeo changed the title Fix for generated line range overflowing (issue #170) Optimized offsetToOriginalRelative range calculation Dec 8, 2021
@SimenB SimenB requested a review from bcoe December 9, 2021 10:27
@glromeo
Copy link
Contributor Author

glromeo commented Dec 12, 2021

#81 was caused mostly by:

if (startCol <= line.startCol && endCol >= line.endCol && !line.ignore) {
line.count = range.count
}

because the v8 startOffset was 50 while the line startCol was 48 that's why I introduced minCol and maxCol which take into account the gaps caused by whitespaces

The part I am not 100% sure of is the handling of lines that have only whitespaces (like 3 in the issue example had only 2 spaces). I added a check that minCol < maxCol to exclude them from the recording.
From the fixes I did in the snapshots I am reassured I improved the situation but I wonder what to think about one of such lines in the middle of an uncover block: should that be uncovered as well or is it ok that results as covered?

.gitattributes Show resolved Hide resolved
.editorconfig Show resolved Hide resolved
lib/source.js Show resolved Hide resolved
glromeo added a commit to glromeo/cypress-c8 that referenced this pull request Dec 12, 2021
@glromeo glromeo marked this pull request as draft December 16, 2021 17:51
@glromeo
Copy link
Contributor Author

glromeo commented Dec 16, 2021

I need to come up with a better solution to extend the ranges covering blank lines...

@glromeo glromeo marked this pull request as ready for review December 18, 2021 00:19
@@ -2871,7 +3154,7 @@ Object {
"37": 0,
"38": 1,
"39": 1,
"4": 1,
"4": 2,
Copy link
Contributor Author

@glromeo glromeo Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshot has been fixed because the original coverage was wrong. "4" (which corresponds to line 5) same as "5" (which is the first line in the function body) have count 2 because they are inside the range in multi-source-map.js
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, I think we might just want to call this a breaking change when we land, but then take it without bumping the major of c8, or other deps, this will allow each dependent library to update their tests.

@@ -279,8 +279,8 @@ Object {
"line": 35,
"loc": Object {
"end": Object {
"column": 0,
"line": 37,
"column": 29,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start of this branch is left unchanged, the end is slightly tighter.
image
I think that's a side effect of trimming the v8 range. Overall the range still seems OK to me
...maybe even better not being greedy?

@glromeo
Copy link
Contributor Author

glromeo commented Dec 18, 2021

@SimenB @bcoe sorry for the noise...this was a hard nut to crack.

I reopened the PR now that I found a solution to the range mismatch problem that I think is effective.
In a nutshell: I noticed that v8 ranges tended not to match with line startCol and endCol and because of
whitespaces. Most of the times lines were excluded from the range beacause of:

if (startCol <= line.startCol && endCol >= line.endCol && !line.ignore) {
line.count = range.count
}

so I ended up resorting to compare the ranges after having trimmed the white spaces which guarantees more reliable matches.
I checked out C8 and linked it to my branch to see if something was getting broken in there and to my surprise I found I just fix some bugs in the coverage there.
5 cases that are very similar:
image
take this one for example:
image
the expected range 27-28 was actually wrong because it was leaving out the method name.

At this point I would like you to review my approach and if I have your blessing I intend to look into how to optimise it.

@glromeo glromeo changed the title Optimized offsetToOriginalRelative range calculation Optimized lines gathering and fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces Dec 18, 2021
@bcoe
Copy link
Member

bcoe commented Dec 21, 2021

@glromeo thank you for this contribution 👏 it might take me a little while to review, just because this library is complex and I need to page it into memory, in the hopes of not breaking anyone.

@bcoe
Copy link
Member

bcoe commented Dec 21, 2021

@glromeo I think the best way to ultimatley test, would be to link against c8 and run against some real world codebases.

@glromeo
Copy link
Contributor Author

glromeo commented Dec 22, 2021

Here's an example new vs old using node-fetch:
Screenshot 2021-12-22 at 00 18 13

@glromeo
Copy link
Contributor Author

glromeo commented Dec 22, 2021

another example from node-fetch
image

@glromeo
Copy link
Contributor Author

glromeo commented Dec 22, 2021

Screenshot 2021-12-22 at 00 40 06

@glromeo
Copy link
Contributor Author

glromeo commented Dec 22, 2021

@bcoe I fixed the shebang snapshot and now all tests are ok (that slipped before because I was using node 16.x)
the original snapshot had 2 branches which doesn't make sense, the fixed has only one
to double check that I ran that file through nyc
image
and I see nyc doesn't count the 1st pass as a branch hence it says 0...but definitely not 2 branches!

Copy link
Member

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized I left a couple comments but didn't leave the review.

Thank you for this contribution, it's looking great. I'll need to do some thorough testing with c8 in a few codebases, just to make sure things are looking good.

Have you managed to do any real world testing with this branch?

let upper = lines.length - 1

let s
while (lower <= upper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you pulled this into a helper file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I actually spent some time asking myself: is it better to do 2 binary searches or not? At the end I preferred to opt for only 1 binary search and then linear scan because it's unlikely that in a large file a range would be too big anyway so I guess that the linear search should perform better

@@ -2871,7 +3154,7 @@ Object {
"37": 0,
"38": 1,
"39": 1,
"4": 1,
"4": 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, I think we might just want to call this a breaking change when we land, but then take it without bumping the major of c8, or other deps, this will allow each dependent library to update their tests.

@bcoe
Copy link
Member

bcoe commented Dec 30, 2021

@glromeo let's release a candidate release of this work which you can pull into cypress-c8, and I can pull into c8. I'm very excited to have had someone dig in and do this cleanup -- I just want to make sure we test thoroughly, as this codebase has been a great source of off by one errors, out of bounds errors, etc., in the past 😆

@glromeo
Copy link
Contributor Author

glromeo commented Dec 30, 2021

thank you @bcoe

Have you managed to do any real world testing with this branch?

the side by side comparisons above are taken from the html coverage of node-fetch linking locally my branch of v8-to-istanbul as you suggested.
It's only one project but big enough and it seems to cover the changes.
I am happy to test it with other real projects if you can point me to them... it's a bit tricky to find ones that use c8 and bring some added value.

@bcoe
Copy link
Member

bcoe commented Dec 31, 2021

@glromeo I usually test with the projects below for a major update:

If I'm feeling really ambitious, I'll test on the Node codebase, which uses c8 for coverage.


I have Monday off, so can potentially do someof this testing, if you don't beat me to the punch. I'll also prioritize giving a thorough review of this PR.

@bcoe
Copy link
Member

bcoe commented Dec 31, 2021

@glromeo I did some initial testing in c8, and it seems like this current implementation is a bit too permissive, running your branch against:

https://github.com/bcoe/c8/blob/main/test/fixtures/all/ts-only/main.ts

Indicates that only lines 4 - 5 are uncovered, whereas the old implementation which flags 16-18 as also not being covered is correct (the line return 'wat?' does not get executed in the test case).

Copy link
Member

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ti might be worth splitting this into two PRs, one that introduces the optimized line gathering -- which we can then benchmark against the old approach (I think this will be a significant benefit for large codebases).

And then I think we might want to dig into the false postives separately. Source Maps are a pain in the neck because they themselves are a best guess of a remapping, so tiny changes to things like line endings can break them -- tldr, we'll need to be careful.

@@ -4,12 +4,14 @@ module.exports = class CovLine {
// note that startCol and endCol are absolute positions
// within a file, not relative to the line.
this.startCol = startCol
this.minCol = startCol + lineStr.length - lineStr.trimStart().length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimming lines may actually be the cause of the remapping issues I saw in c8, since we specifically see breakage with tests that exercise SourceMap files.

The source map lookup logic is quite finicky, and trimming before performing it may need to be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcoe the investigation I did suggests me that the tests are broken but that the accuracy of the coverage improves. As I posted here and in the node-fetch comparisons the existing logic seems to often miss the first line of a range. I look forward to your feedback on those examples

@glromeo
Copy link
Contributor Author

glromeo commented Dec 31, 2021

ok! I'll split the PR first and then I'll look into the breakages... thank you for pointing them out!

@glromeo
Copy link
Contributor Author

glromeo commented Jan 1, 2022

@bcoe

Indicates that only lines 4 - 5 are uncovered, whereas the old implementation which flags 16-18 as also not being covered is correct (the line return 'wat?' does not get executed in the test case).

I made sure that the coverage folder was deleted and I re-run the tests in c8 linked to my branch and I got
image

is in the html reports that you saw the errors?

$ node --version
v16.13.1

@glromeo
Copy link
Contributor Author

glromeo commented Jan 1, 2022

@bcoe please note that this PR now follows #173

@glromeo glromeo changed the title Optimized lines gathering and fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces Fixed startCol/endCol vs startOffset/endOffset mismatch due to greediness and whitespaces Jan 1, 2022
@glromeo
Copy link
Contributor Author

glromeo commented Jan 3, 2022

Given the findings in #173 I better close this PR at this point and open a new one later on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants