Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sso-auth: implement Azure AD provider #98

Closed
shrayolacrayon opened this issue Oct 24, 2018 · 22 comments · May be fixed by #118
Closed

sso-auth: implement Azure AD provider #98

shrayolacrayon opened this issue Oct 24, 2018 · 22 comments · May be fixed by #118
Labels
enhancement New feature or request

Comments

@shrayolacrayon
Copy link

Add Azure AD as a provider to sso-auth as a part of #27

@shrayolacrayon shrayolacrayon added the papercuts frustrations to fix but not necessarily a bug label Oct 24, 2018
@shrayolacrayon
Copy link
Author

shrayolacrayon commented Oct 24, 2018

@sporkmonger opened a separate issue for implementing the Azure AD provider.

@sporkmonger
Copy link
Contributor

sporkmonger commented Oct 24, 2018

Just as a clarification, I'm also porting the OIDC provider as part of this work. I may try to land it as a preceding PR.

@shrayolacrayon shrayolacrayon added enhancement New feature or request and removed papercuts frustrations to fix but not necessarily a bug labels Oct 24, 2018
@emilaasa
Copy link

@sporkmonger
Copy link
Contributor

I'm only doing v2. And yes, I think you need #95 for v2 to work at all.

@sporkmonger
Copy link
Contributor

I now have the Azure AD provider working, just working out some issues w/ groups propagating through to the proxy side of things and writing tests.

@emilaasa
Copy link

Sounds good @sporkmonger, I'd be happy to help with testing!

@sporkmonger
Copy link
Contributor

@shrayolacrayon I'm almost done with the AD provider, but the main hang-up I still have is that for whatever reason, the Groups []string field in the SessionState isn't arriving on the proxy side of things. The auth server gets it in the session, but the proxy doesn't. It seems like it might be a hang-up when the session gets serialized into the proxy's authorization code query parameter, but I can't find anything obvious that might cause that one field in the session (but none of the others) to get dropped. Any ideas?

@emilaasa
Copy link

emilaasa commented Oct 31, 2018

@sporkmonger got the code up somewhere public? I got all of this week to work on this at work so I’m happy to help.

@shrayolacrayon
Copy link
Author

@sporkmonger, I can take a look at the changes if they're up somewhere! The Groups field is set in the OAuthCallback function - https://github.com/buzzfeed/sso/blob/master/internal/proxy/oauthproxy.go#L829, which calls the provider's ValidateGroup function, so I'm curious what that looks like to start with.

@sporkmonger
Copy link
Contributor

sporkmonger commented Oct 31, 2018 via email

@shrayolacrayon
Copy link
Author

It would error if the groups are not valid and inGroups are the subset of groups that the user is part given a list of alloweGroups.
Looking at this again, it's becoming much more apparent that there is room to simplify this logic to make it potentially work better with new providers.

@sporkmonger
Copy link
Contributor

Yeah, I probably won't try to do that simplification as part of this PR, but I do suspect when this is done it'll be more obvious what could benefit from some more attention. The caching for the Google provider also strikes me as something in that vein. I opted to use the Hashicorp LRU cache in lieu because the existing cache mechanism was much too specific to the initial use-case.

@sporkmonger
Copy link
Contributor

I didn't implement ValidateGroup so it should just be falling through to the ProviderData implementation which just returns true, so I don't think that's it.

@sporkmonger
Copy link
Contributor

Oh wait, there's two different ValidateGroup functions and the one in the proxy does this:

inGroups := []string{}
if len(allowedGroups) == 0 {
	return inGroups, true, nil
}

This seems surprising?

@shrayolacrayon
Copy link
Author

Yeah I agree that simplifying these things would be out of the scope of this task.

There is a ValidateGroup function in sso-proxy because of the nested authentication flow. We query sso-auth for what groups a given user is in given a list of all the allowed groups (which is only located in the upstream configuration file on the sso-proxy side) and then check that the there is an intersection between the groups the the user is in, and the allowed groups on the sso-proxy side. If there is no group being returned in the ValidateGroup function in the Azure provider, it would make sense that it is not being set in the SessionState struct.

I didn't implement ValidateGroup so it should just be falling through to the ProviderData implementation which just returns true, so I don't think that's it.

Side note: We want to removeProviderData as the "default" provider interface for providers. This is because when creating new providers, developers could forget to implement a function, leading to hard to track down bugs. It would probably be better to explicitly add that function as part of the interface so that it would be easier to definitively know the behavior.

@sporkmonger
Copy link
Contributor

sporkmonger commented Oct 31, 2018

Yeah, I looked through that PR. Definitely expecting merge conflicts, but that's OK. Also agree w/ rationale, makes it easier to do useful compile-time interface verification as well. If default provider satisfies interface, compile-time check would always pass.

New status quo, now that I'm explicitly setting allowed_groups is the Error validating group membership, please try again error.

I think this design has some advantages and disadvantages. The main disadvantage is that I believe apps should preferentially be responsible for their own authorization logic. Which implies that the proxy should not be deciding what groups are allowed or disallowed. However, in practice, many apps are not going to be written by the people deploying them (e.g. basically anything open source or supplied by a vendor). These should have allowed_groups as a fallback mechanism. That said, I do see some advantages here. Most groups a user belongs to, especially in an AD context, are likely to have nothing to do w/ SSO or the proxy. Like, e.g, we have a group for Slack users. That's going to be inapplicable to every single app behind the proxy. If allowed_groups functions as a filter for the set of groups relevant to the app being proxied, that likely does end up being useful, particularly for more complex AD deployments. However, we also have apps where whitelisting the set of applicable groups may be prohibitive.

@sporkmonger
Copy link
Contributor

OK, got groups plumbed through end-to-end. Bit hacked together at the moment, still gotta do a bunch of clean-up.

@sporkmonger
Copy link
Contributor

Success, got things fully working (no longer just hacked together). Just working on clean-up and tests.

@emilaasa
Copy link

emilaasa commented Nov 6, 2018

@sporkmonger I'm curious about the timeline for your PR. I'd like to test the Azure AD integration sometime late this week and I would rather try it with your work than the hacking that I've done (which works well enough but is not pretty).
Feels like double work to polish my solution too, so I'd be happy to help you out with testing and cleanup if that is possible. If you feel like you have a lot of work left to do and don't want to make a pull request yet I'd be happy to check out what you've got so far and help out in any way I can!

@sporkmonger
Copy link
Contributor

So I could publish a branch that's not done w/ the understanding that it's, well, not done. The implementation works but there's setup documentation that I haven't done yet and the tests don't currently pass. I've also been coding against a weird custom merge branch because I've got a couple other PRs out there that our internal staging environment relies on. If there's a desire for early access, I'm totally happy to facilitate that, but it probably would be in the form of commits against our "internal" branch. The biggest caveat is that currently there's a still huge time delay the first time someone logs in after a new deploy while the group membership cache warms up thanks to Microsoft's questionable API design requiring so many round trips. That tends to result in annoying timeouts for the first couple of logins if the people logging in belong to a large number of groups.

In terms of when I think it'll actually be done, I made a ton of progress today after being completely roadblocked for most of the week on a super-obscure test mock bug. I might be able to get a PR up by Friday, but I definitely wouldn't promise that.

@sporkmonger
Copy link
Contributor

OK, I've committed everything and pushed to our custom merge branch: https://github.com/Remitly/sso/tree/production

I've still got 3 groups related test cases I need to get passing. The mock for OIDC was gnarly.

@mreiferson
Copy link
Contributor

see #118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants