Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DONTMERGE Fix excluded_paths_regex in finalize_manifest.py #130

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented Feb 16, 2023

Description of the changes

The updated list of excluded "trusted files" paths better follows the Filesystem Hierarchy Standard (FHS).

Based on:

Fixes #128.

Additionally, this PR includes a quick second commit, which sets USER in ubuntu18.04-hello-world test. This is for quick tests of non-root images.

How to test this PR?

Test some Docker images, no functional changes.


This change is Reviewable

The updated list of excluded "trusted files" paths better follows the
Filesystem Hierarchy Standard (FHS).

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
This is for quick tests of non-root images.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


finalize_manifest.py line 56 at r1 (raw file):

                                r'|sys/.*'
                                r'|tmp/.*'
                                r'|var/.*)$')

@woju thinks we need to exclude only some subdirs of /var/ (or better, include only some subdirs of /var/). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.

@dimakuv
Copy link
Contributor Author

dimakuv commented Feb 16, 2023

@jkr0103 @anjalirai-intel @aneessahib Does this PR make sense to you?

@aneessahib
Copy link
Contributor

yes looks good. An additional thought - would it make sense to give the flexibility to the user to add more exclude paths via a proper template? This will help users create smaller GSC images as well.

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aneessahib

An additional thought - would it make sense to give the flexibility to the user to add more exclude paths via a proper template? This will help users create smaller GSC images as well.

Sounds like a bit too much... Unless you have a simple idea how users will provide such excluded paths in command-line args (or in some other way)?

Reviewable status: 0 of 3 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

Copy link
Contributor

@aneessahib aneessahib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via separate file template. like a file exclude_path.template for example ? We are having to tell multiple people to manually edit finalize_manifest.py which is deep in GSC. A documented interface might help. Anyways, thats a separate thing to address.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


Documentation/index.rst line 264 at r1 (raw file):

   generate a list of trusted files. GSC excludes some system files and paths
   (for the exact list, see :ref:`excluded-paths`), since checksums are required
   which either don't exist or may vary across different deployment machines.

I think somewhere here we should point to finalize_manifest.py. So that users can modify the exception list according to their needs?

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are having to tell multiple people to manually edit finalize_manifest.py which is deep in GSC.

But why? Why do they have to modify it? What's breaking?

via separate file template. like a file exclude_path.template for example?

I'm not sure how this should work. Could you give an example of such a template? Would it be a list of regexes?

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib)


Documentation/index.rst line 264 at r1 (raw file):

Previously, aneessahib (Anees Sahib) wrote…

I think somewhere here we should point to finalize_manifest.py. So that users can modify the exception list according to their needs?

No, finalize_manifest.py is an internal detail of GSC. We shouldn't tell users to modify the GSC source code.

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib and @dimakuv)


finalize_manifest.py line 36 at r1 (raw file):

def generate_trusted_files(root_dir, already_added_files):
    # please keep this list in sync with the one in GSC documentation
    excluded_paths_regex = (r'^/('

While you're at it, can you reformat it?

  • drop hanging indent, it's wasting too much horizontal space and makes inline comments difficult
  • rewrite as multiline string for re.VERBOSE (https://docs.python.org/3/library/re.html#re.VERBOSE)
  • drop common / before ( (it's confusing)
  • drop ^ at the beginning and .* at the end (.match() method takes care of those)
  • inline re.compile:
exclude_re = re.compile(r'''
    ( /boot
    | /dev
    | /efi

    # below files are security-critical
    | /etc/\.pwd\.lock

    | /gramine/app_files/finalize_manifest\.py # will be removed in final GSC image
    )
''', re.VERBOSE)

exclude_re.match(...) # this is already OK below

Or alternatively you can rework this as a list of fnmatch aka "glob" patterns (https://docs.python.org/3/library/fnmatch.html), which might be better if you want to add CLI-level configuration (--include-files "/var/lib/*", --exclude-files /etc/\*passwd), because many CLI tools have fnmatch-like patterns here, not regexps (like https://manpages.debian.org/bullseye/tar/tar.1.en.html#exclude or https://manpages.debian.org/bullseye-backports/rsync/rsync.1.en.html#PATTERN_MATCHING_RULES).

exclude_globs = [
    '/boot',
    '/dev',
    '/media/*',
    #...
]

if any(fnmatch.fnmatch(to_be_walked_dir, pattern) for pattern in exclude_globs):
    ...

finalize_manifest.py line 56 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@woju thinks we need to exclude only some subdirs of /var/ (or better, include only some subdirs of /var/). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.

From the top of my head:

  • Most HTTP servers: /var/www
  • Postgresql: /var/lib/postgresql (I guess no-one runs it, because shm, but it's a data point)
  • OpenLDAP: /var/lib/ldap
  • ...

Now some of them may not be appropriate (e.g. postfix uses /var/lib/postfix to store lockfiles, when it should've used /run). But certainly there are some things that are in fact needed. Esp. when they're the data that is actually served (/var/www). But if you think about adding some configurability, you'd have to add two mechanisms, one for adding to exclude_re, and second for "ouch, I needed this directory after all".


Documentation/index.rst line 264 at r1 (raw file):

We shouldn't tell users to modify the GSC source code.

Yes.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib and @dimakuv)


finalize_manifest.py line 56 at r1 (raw file):

@woju thinks we need to exclude only some subdirs of /var/ (or better, include only some subdirs of /var/). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.

I don't think there is a right list of "userful subdirs" for "normal" apps. And if we exclude only some subdirs of /var/, then what about /run/ (isn't it a similar case: see https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.html)?

But if you think about adding some configurability, you'd have to add two mechanisms

I think it's useful but not sure whether we do this in this PR (or as a follow-up).


test/ubuntu18.04-hello-world.dockerfile line 4 at r1 (raw file):

RUN apt-get -y update
RUN groupadd -r user && useradd -r -g user user

Any specific considerations that we apply -r here?

Code quote:

RUN groupadd -r user && useradd -r -g user user

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @kailun-qin, and @woju)


finalize_manifest.py line 36 at r1 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

While you're at it, can you reformat it?

  • drop hanging indent, it's wasting too much horizontal space and makes inline comments difficult
  • rewrite as multiline string for re.VERBOSE (https://docs.python.org/3/library/re.html#re.VERBOSE)
  • drop common / before ( (it's confusing)
  • drop ^ at the beginning and .* at the end (.match() method takes care of those)
  • inline re.compile:
exclude_re = re.compile(r'''
    ( /boot
    | /dev
    | /efi

    # below files are security-critical
    | /etc/\.pwd\.lock

    | /gramine/app_files/finalize_manifest\.py # will be removed in final GSC image
    )
''', re.VERBOSE)

exclude_re.match(...) # this is already OK below

Or alternatively you can rework this as a list of fnmatch aka "glob" patterns (https://docs.python.org/3/library/fnmatch.html), which might be better if you want to add CLI-level configuration (--include-files "/var/lib/*", --exclude-files /etc/\*passwd), because many CLI tools have fnmatch-like patterns here, not regexps (like https://manpages.debian.org/bullseye/tar/tar.1.en.html#exclude or https://manpages.debian.org/bullseye-backports/rsync/rsync.1.en.html#PATTERN_MATCHING_RULES).

exclude_globs = [
    '/boot',
    '/dev',
    '/media/*',
    #...
]

if any(fnmatch.fnmatch(to_be_walked_dir, pattern) for pattern in exclude_globs):
    ...

Ok, I really like @woju's proposal on fnmatch and the new --include-files/--exclude-files command-line options. I think I should close this PR, as the proposal is significantly different from what this PR tries to achieve.

Someone (maybe me...) will need to implement and test Woju's proposal.


finalize_manifest.py line 56 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

@woju thinks we need to exclude only some subdirs of /var/ (or better, include only some subdirs of /var/). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.

I don't think there is a right list of "userful subdirs" for "normal" apps. And if we exclude only some subdirs of /var/, then what about /run/ (isn't it a similar case: see https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.html)?

But if you think about adding some configurability, you'd have to add two mechanisms

I think it's useful but not sure whether we do this in this PR (or as a follow-up).

Yeah, ok, you're both right. This all needs much better handling. See my other comment.


Documentation/index.rst line 264 at r1 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

We shouldn't tell users to modify the GSC source code.

Yes.

Done.


test/ubuntu18.04-hello-world.dockerfile line 4 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Any specific considerations that we apply -r here?

I took this example from Docker documentation, but I agree, I don't see any reason to create a "system" user. I could remove it.

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib and @kailun-qin)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @dimakuv, @kailun-qin, and @woju)


test/ubuntu18.04-hello-world.dockerfile line 5 at r1 (raw file):

This is for quick tests of non-root images.

Won't this break images which assume root? I.e. aren't you fixing some images and breaking some others at the same time?

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @kailun-qin, and @woju)


test/ubuntu18.04-hello-world.dockerfile line 5 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is for quick tests of non-root images.

Won't this break images which assume root? I.e. aren't you fixing some images and breaking some others at the same time?

Sorry @mkow I don't understand the question. Why did you pose the question here, in the HelloWorld example?

If you're talking about this PR as a whole, then: I tested it with both root and non-root images, and it works. The change is supposed to be minimal, and it doesn't seem to affect existing workloads.

But anyway, this PR should be considered deprecated (I'll add a DONTMERGE note now), and I would like for someone to create a new PR with @woju's proposal, see other comment.

@dimakuv dimakuv added the invalid This doesn't seem right label Mar 6, 2023
@dimakuv dimakuv changed the title Fix excluded_paths_regex in finalize_manifest.py DONTMERGE Fix excluded_paths_regex in finalize_manifest.py Mar 6, 2023
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @kailun-qin, and @woju)


test/ubuntu18.04-hello-world.dockerfile line 5 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Sorry @mkow I don't understand the question. Why did you pose the question here, in the HelloWorld example?

If you're talking about this PR as a whole, then: I tested it with both root and non-root images, and it works. The change is supposed to be minimal, and it doesn't seem to affect existing workloads.

But anyway, this PR should be considered deprecated (I'll add a DONTMERGE note now), and I would like for someone to create a new PR with @woju's proposal, see other comment.

Ah, I misunderstood what the second commit does, resolving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix excluded_paths_regex in finalize_manifest.py
5 participants