Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vendor scanning for windows #684

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

another-rex
Copy link
Collaborator

Fixes #657

Did some minor refactoring to move vendor scanning code into separate file. Also added a test for vendor scanning.

This does not add a unit test for crlf change, as git might change line endings of the fixture files upon checkout. (maybe add a .gitattribute file in the future to lock this down to not change line endings.).

Manually tested it to confirm the change works.

The only "code" (non test, non refactor) change is line 105 in pkg/osvscanner/vendored_libs.go

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2023

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (c75d056) 79.08% compared to head (66b76d0) 80.05%.
Report is 2 commits behind head on main.

Files Patch % Lines
pkg/osvscanner/vendored_libs.go 61.84% 21 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #684      +/-   ##
==========================================
+ Coverage   79.08%   80.05%   +0.96%     
==========================================
  Files          86       87       +1     
  Lines        6121     6127       +6     
==========================================
+ Hits         4841     4905      +64     
+ Misses       1075     1004      -71     
- Partials      205      218      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rex!

return nil
}

windowsEnding := []byte("\r\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the bits here are the only functional change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case personally I'd prefer the refactor in a dedicated PR since it's such a huge change 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed the message, other than adding the tests and the 2 lines for replacing line endings, no code changes has been made, just copied and pasted into a separate file to make it easier to read.

return err
}

buf = bytes.ReplaceAll(buf, windowsEnding, unixEnding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are there going to be upstream repos with windows endings as the source of truth?

I wonder if we need to be doing this normalization on our indexing side also?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about this, it could be the case, though I believe the default is LF for git, so this will be pretty rare.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add some logging into the indexer to see if this is the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I feel like that would be safest so you don't have to think about non-line endings being replaced - I'm pretty sure (especially with what we're discussed in the past @another-rex) that shouldn't be the case because it should be the same underlying bytes (and I expect that's what Git sees too?), but at the end of the day I personally would sleep better knowing that there was a straightforward path to normalization...

I don't know too much about this area of things, but for indexing could you actually index both with and without the replacement (and even do that regardless of if a repo is using \n or \r\n)? that way the downstream wouldn't ever have to actually care right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of? It will double the storage in datastore and increase the indexer runtime, for I don't think much benefit.

Are you referring to e.g. UTF-8 strings potentially having the bytes 13 10 appearing later on in a e.g. 3 or 4 byte wide character/rune? (I'm not sure if these character exist, but it is a possibility) This will be pretty rare since C or C++ compilers don't commonly support unicode characters in the source code afaik.

Either way I think that can be avoided by adding the same normalization on the indexer side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

autocrlf issue when performing c++ package analysis on windows system
4 participants