Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15849 control: Add client uid map to agent config #14381

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented May 15, 2024

Allow daos_agent to optionally handle unresolvable client
uids via custom mapping. In deployments where the agent
may not have access to the same user namespace as client
applications (e.g. in containerized deployments), the
client_user_map can provide a fallback mechanism for
resolving the client uids to known usernames for the
purpose of applying ACL permissions tests.

Example agent config:

credential_config:
  client_user_map:
    default:
      user: nobody
      group: nobody
    1000:
      user: joe
      group: blow

Features: control
Required-githooks: true
Change-Id: I72905ccc5ddee27fc2101aa4358a14e352c86253
Signed-off-by: Michael MacDonald mjmac@google.com

Copy link

Ticket title is 'Provide a mechanism for mapping unknown client uids to user/group names'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-15849

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14381/1/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14381/4/execution/node/1378/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14381/4/testReport/

@mjmac mjmac marked this pull request as ready for review May 21, 2024 13:47
@mjmac mjmac requested review from a team as code owners May 21, 2024 13:47
Allow daos_agent to optionally handle unresolvable client
uids via custom mapping. In deployments where the agent
may not have access to the same user namespace as client
applications (e.g. in containerized deployments), the
client_user_map can provide a fallback mechanism for
resolving the client uids to known usernames for the
purpose of applying ACL permissions tests.

Example agent config:

credential_config:
  client_user_map:
    default:
      user: nobody
      group: nobody
    1000:
      user: joe
      group: blow

Features: control
Required-githooks: true
Change-Id: I72905ccc5ddee27fc2101aa4358a14e352c86253
Signed-off-by: Michael MacDonald <mjmac@google.com>
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't block on any of my suggestions. This is pretty cool. Nice work.

I find myself wondering if there might be some benefit to making use of interface types for this scenario. Having two different data sources for user/group/etc., it seems like a good fit for the concept.

Comment on lines +440 to +452
for uid, exp := range tc.expMap {
gotUser := result.Lookup(uid)
if diff := cmp.Diff(exp.User, gotUser.User); diff != "" {
t.Fatalf("unexpected User (-want, +got)\n %s", diff)
}
}

if expDefUser, found := tc.expMap[defaultMapKey]; found {
gotDefUser := result.Lookup(1234567)
if diff := cmp.Diff(expDefUser, gotDefUser); diff != "" {
t.Fatalf("unexpected DefaultUser (-want, +got)\n %s", diff)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - I'd be inclined to break these out into a separate test for the Lookup method.

Comment on lines -707 to -709
if tc.req.userExt == nil {
tc.req.userExt = mockExt
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, so the test wasn't actually using the mocks for anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +107 to +130
req.getGroupIdsFn = func() ([]string, error) {
u, err := req.user()
if err != nil {
return nil, err
}
return u.GroupIds()
}
req.getGroupNamesFn = func() ([]string, error) {
groupIds, err := req.getGroupIdsFn()
if err != nil {
return nil, err
}

groupNames := make([]string, len(groupIds))
for i, gID := range groupIds {
g, err := req.getGroupFn(gID)
if err != nil {
return nil, err
}
groupNames[i] = g.Name
}

return groupNames, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion - To test the internals of these functions, I'd suggest breaking them out as normal named methods, and keeping the mockable interface as thin as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mjmac mjmac requested a review from tanabarr May 22, 2024 13:24
if err != nil {
m.log.Errorf("%s: failed to get AuthSys struct: %s", info, err)
return m.credRespWithStatus(daos.MiscError)
if err := func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for making this an anonymous function rather than just inline?

}
successBytes, err := proto.Marshal(
&auth.GetCredResp{
Status: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need to specify zero value

}(),
responses: []response{
{
cred: nil,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need to specify zero value, and again below

defer cleanup()

mod := NewSecurityModule(log, tc.secCfg)
mod.signCredential = func() func(req *auth.CredentialRequest) (*auth.Credential, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why func() sig {...} () Rather than just sig {...} here and specify var idx int at top-level inside t.Run()?

Comment on lines -707 to -709
if tc.req.userExt == nil {
tc.req.userExt = mockExt
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this change won't enable any undesired security consequences outside container environments? The prospect of nefarious mappings seems like something we really should think about, but perhaps it has? Is the assumption that the mappings can only be applied by an administrator who has write access to the configuration files and there aren't any other vectors that we need to be concerned about?

Also the Commit message seems to me to imply that the changes are only required for testing purposes: "...client_user_map can provide a fallback mechanism for resolving the client uids to known usernames for the purpose of applying ACL permissions tests.". I assume that this is a typo (or me misunderstanding) and the changes are indeed needed for production deployments?

DomainInfo *security.DomainInfo
SigningKey crypto.PrivateKey
getHostnameFn func() (string, error)
getUserFn func(string) (*user.User, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to have aliases for these function signatures

Comment on lines +107 to +130
req.getGroupIdsFn = func() ([]string, error) {
u, err := req.user()
if err != nil {
return nil, err
}
return u.GroupIds()
}
req.getGroupNamesFn = func() ([]string, error) {
groupIds, err := req.getGroupIdsFn()
if err != nil {
return nil, err
}

groupNames := make([]string, len(groupIds))
for i, gID := range groupIds {
g, err := req.getGroupFn(gID)
if err != nil {
return nil, err
}
groupNames[i] = g.Name
}

return groupNames, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kjacque
Copy link
Contributor

kjacque commented May 22, 2024

Are we sure this change won't enable any undesired security consequences outside container environments? The prospect of nefarious mappings seems like something we really should think about, but perhaps it has? Is the assumption that the mappings can only be applied by an administrator who has write access to the configuration files and there aren't any other vectors that we need to be concerned about?

I thought about this for a bit, too. As long as the admin secures their DAOS config files and certificates from unauthorized users, I think we should be OK. If the admin has secured the system with certificates with appropriate permissions (private key readable by daos_agent only), attackers will not be able to start their own private daos_agent to interface with the system with a custom config either. I see these as the two main attack vectors. @dpquigl can you think of anything else?

@tanabarr tanabarr self-requested a review May 29, 2024 21:34
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerns assuaged by @dpquigl 's comments elsewhere:

So if I understand this correctly you have a multi-container environment where the socket for daos-agent is exported from one container and used in another. So when you check the sender’s credentials on the socket you’re getting he euid and egid from the process in the other container. However since the daos-agent container probably only has the users needed for the applications running in the container you can’t do a lookup on the users and groups. This seems like a reasonable solution. NFSv4 has a similar mechanism where it will translate a local uid to a name and back into a local uid on the remote side because of just this issue. The agent is a trusted entity in the security architecture so I agree that as long as you keep the agent integrity its ok for it to take on the additional responsibility of performing the translation. Is the reason this is in the same config file just for convenience? It might be a good idea to have it in a separate file for manageability.

I think moving the new section to a separate file is probably something we don't have bandwidth for currently so I'm happy to merge this now.

@mjmac
Copy link
Contributor Author

mjmac commented Jun 4, 2024

Are we sure this change won't enable any undesired security consequences outside container environments? The prospect of nefarious mappings seems like something we really should think about, but perhaps it has? Is the assumption that the mappings can only be applied by an administrator who has write access to the configuration files and there aren't any other vectors that we need to be concerned about?

As Kris & David have pointed out, the agent is considered a trusted component, and only the admin should be able to modify its configuration. As long as the agent trust is maintained, then there's no concern that malicious mappings could be created. If the agent trust is broken (due to file permissions or whatever), then pretty much anything could happen.

Also the Commit message seems to me to imply that the changes are only required for testing purposes: "...client_user_map can provide a fallback mechanism for resolving the client uids to known usernames for the purpose of applying ACL permissions tests.". I assume that this is a typo (or me misunderstanding) and the changes are indeed needed for production deployments?

Perhaps there's a better term, but I was using "test" in the sense that an access request is tested by applying the ACL and either accepting or rejecting the request. Maybe "check" would have been clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants