New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix vendor scanning for windows #684
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #684 +/- ##
==========================================
+ Coverage 79.08% 80.05% +0.96%
==========================================
Files 86 87 +1
Lines 6121 6127 +6
==========================================
+ Hits 4841 4905 +64
+ Misses 1075 1004 -71
- Partials 205 218 +13 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Rex!
return nil | ||
} | ||
|
||
windowsEnding := []byte("\r\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the bits here are the only functional change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case personally I'd prefer the refactor in a dedicated PR since it's such a huge change 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed the message, other than adding the tests and the 2 lines for replacing line endings, no code changes has been made, just copied and pasted into a separate file to make it easier to read.
return err | ||
} | ||
|
||
buf = bytes.ReplaceAll(buf, windowsEnding, unixEnding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, are there going to be upstream repos with windows endings as the source of truth?
I wonder if we need to be doing this normalization on our indexing side also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought about this, it could be the case, though I believe the default is LF for git, so this will be pretty rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add some logging into the indexer to see if this is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I feel like that would be safest so you don't have to think about non-line endings being replaced - I'm pretty sure (especially with what we're discussed in the past @another-rex) that shouldn't be the case because it should be the same underlying bytes (and I expect that's what Git sees too?), but at the end of the day I personally would sleep better knowing that there was a straightforward path to normalization...
I don't know too much about this area of things, but for indexing could you actually index both with and without the replacement (and even do that regardless of if a repo is using \n
or \r\n
)? that way the downstream wouldn't ever have to actually care right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of? It will double the storage in datastore and increase the indexer runtime, for I don't think much benefit.
Are you referring to e.g. UTF-8 strings potentially having the bytes 13 10
appearing later on in a e.g. 3 or 4 byte wide character/rune? (I'm not sure if these character exist, but it is a possibility) This will be pretty rare since C or C++ compilers don't commonly support unicode characters in the source code afaik.
Either way I think that can be avoided by adding the same normalization on the indexer side.
Fixes #657
Did some minor refactoring to move vendor scanning code into separate file. Also added a test for vendor scanning.
This does not add a unit test for crlf change, as git might change line endings of the fixture files upon checkout. (maybe add a .gitattribute file in the future to lock this down to not change line endings.).
Manually tested it to confirm the change works.
The only "code" (non test, non refactor) change is line 105 in
pkg/osvscanner/vendored_libs.go