Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: better clean up of file handles #2823

Merged
merged 5 commits into from May 1, 2024

Conversation

willmurphyscode
Copy link
Contributor

@willmurphyscode willmurphyscode commented Apr 29, 2024

Previously, the ELF binary package cataloger would leak file handles, and the generic cataloger loop would leak file handles if parsing the file panicked. Instead, ensure both situations defer close new file readers as soon as they are made.

The incorrect behavior had been happening for some time, but was probably exposed by #2814 which caused the ELF binary package cataloger to rely more on file handles and less on in-memory buffers.

Fixes #2819 - in my testing, syft 1.3.0 needs about 1500 concurrent file handles to parse pytorch/pytorch:latest; after this change, the number is closer to 800.

The elf-binary-package-cataloger does its own file IO to account for the
possibility of a logical ELF package being broken across multiple
physical files. However, this casued it to skip the normal invocation
pattern in the generic cataloger code that prevented file leaks. Ensure
this cataloger always closes its file handles.

Signed-off-by: Will Murphy <will.murphy@anchore.com>
Otherwise, a panicking cataloger could leak file handles.

Signed-off-by: Will Murphy <will.murphy@anchore.com>

log.WithFields("path", location.RealPath).Trace("parsing file contents")
invokeParser := func(location file.Location, parser Parser) ([]pkg.Package, []artifact.Relationship, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be a static function over a closure? it would be simpler to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It will have a lot more parameters (env and logger), but it will be easier to reason about, so that's probably the right trade off. I'll try to push up a rev.

Signed-off-by: Will Murphy <will.murphy@anchore.com>
Signed-off-by: Will Murphy <will.murphy@anchore.com>
@willmurphyscode
Copy link
Contributor Author

@wagoodman / @spiffcs - refactored and added a unit test for not leaking file on generic cataloger panic.

Signed-off-by: Will Murphy <will.murphy@anchore.com>
@@ -122,17 +123,9 @@ func (c *Cataloger) Catalog(ctx context.Context, resolver file.Resolver) ([]pkg.

log.WithFields("path", location.RealPath).Trace("parsing file contents")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional opportunity for unrelated cleanup -- this should be using logger not log

Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome test 🥇

@willmurphyscode willmurphyscode merged commit 93a7d2e into main May 1, 2024
11 checks passed
@willmurphyscode willmurphyscode deleted the fix-elf-cataloger-close-files branch May 1, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New version 1.3.0 leads to "too many open files" while scanning bigger images
3 participants