Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

kiam-server broken in v4.1 #491

Open
mestuddtc opened this issue Jul 21, 2021 · 5 comments
Open

kiam-server broken in v4.1 #491

mestuddtc opened this issue Jul 21, 2021 · 5 comments

Comments

@mestuddtc
Copy link

I had a working configuration using v4.0, but v4.1 is broken. I am running outside of AWS, so autodetecting roles is not an option.

These are the server parameters, working in v4.0.

      containers:
      - args:
        - server
        - --json-log
        - --level=debug
        - --bind=0.0.0.0:443
        - --cert=/etc/kiam/tls/server.pem
        - --key=/etc/kiam/tls/server-key.pem
        - --ca=/etc/kiam/tls/ca.pem
        - --role-base-arn=arn:aws:iam::682359534587:role/
        - --sync=1m
        - --prometheus-listen-addr=0.0.0.0:9620
        - --prometheus-sync-interval=5s

With v4.1 or master, the server logs:

{"level":"info","msg":"started prometheus metric listener 0.0.0.0:9620","time":"2021-07-21T19:39:51Z"}
{"level":"info","msg":"starting server","time":"2021-07-21T19:39:51Z"}
{"level":"fatal","msg":"error using AWS STS Gateway: role can't be empty","time":"2021-07-21T19:39:51Z"}
@dmorgan81
Copy link

Took a look at this one. This change is responsible: 8103483

I unsure if this is a bug. Before the change above it was assumed server pods were running on a node with the appropriate IAM permissions; the STS gateway config would just not provide any credentials. Now it appears --assume-role-arn must be provided and set to a role the node can assume. That role must also have the necessary permissions for everything to work.

If this is the intended behavior then the docs should be updated to reflect that and the helm chart should be updated to require a value for assumeRoleArn because just providing a role base ARN (either via --role-base-arn or --role-base-arn-autodetect) is not enough now.

@mestuddtc
Copy link
Author

I would consider it a bug for two reasons:

  1. it is an incompatible, breaking change in a minor update. If it was intended, the version should have been 5.0, should it not?
  2. the credentials are being provided to kiam-server via IRSA annotations (earlier I think access key and secret), which explicitly give the permissions to do its work. It doesn't make sense to me that I would have to add another layer of IAM roles and permissions.

@dmorgan81
Copy link

I agree with you on both points. I don't like to think that both of us were leveraging undefined behavior that just happened to work in v4.0.

Hopefully a KIAM dev can chime in on whether 8103483 unintentionally broke our use case. Fixing it wouldn't be difficult; just check if b.config.AssumeRoleArn is empty and skip trying to resolve the role if so.

@stephan2012
Copy link

Same issue after upgrading to v4.2, but the server is running inside AWS:

{"level":"info","msg":"starting server","time":"2022-02-23T19:15:04Z"}
{"level":"info","msg":"started prometheus metric listener 0.0.0.0:9620","time":"2022-02-23T19:15:04Z"}
{"level":"info","msg":"detecting arn prefix","time":"2022-02-23T19:15:04Z"}
{"level":"info","msg":"using detected prefix: arn:aws:iam::xxxxxxxxxxxx:role/","time":"2022-02-23T19:15:04Z"}
{"level":"fatal","msg":"error using AWS STS Gateway: role can't be empty","time":"2022-02-23T19:15:04Z"}

Config:

    spec:
      containers:
      - args:
        - --json-log
        - --level=info
        - --bind=0.0.0.0:443
        - --cert=/etc/kiam/tls/tls.crt
        - --key=/etc/kiam/tls/tls.key
        - --ca=/etc/kiam/tls/ca.crt
        - --role-base-arn-autodetect
        - --session-duration=15m
        - --sync=1m
        - --prometheus-listen-addr=0.0.0.0:9620
        - --prometheus-sync-interval=5s
        command:
        - /kiam
        - server

Downgrading to v4.0 resolves the error:

{"level":"info","msg":"starting server","time":"2022-02-23T19:17:20Z"}
{"level":"info","msg":"started prometheus metric listener 0.0.0.0:9620","time":"2022-02-23T19:17:20Z"}
{"level":"info","msg":"detecting arn prefix","time":"2022-02-23T19:17:20Z"}
{"level":"info","msg":"using detected prefix: arn:aws:iam::xxxxxxxxxxxx:role/","time":"2022-02-23T19:17:20Z"}
{"level":"info","msg":"detecting arn prefix","time":"2022-02-23T19:17:20Z"}
{"level":"info","msg":"using detected prefix: arn:aws:iam::xxxxxxxxxxxx:role/","time":"2022-02-23T19:17:20Z"}
{"level":"info","msg":"will serve on 0.0.0.0:443","time":"2022-02-23T19:17:20Z"}

@dmorgan81
Copy link

We "fixed" this by setting server.assumeRoleArn in the helm chart to the IAM role our nodes already as. Silly but it does restore behavior to what it was in 4.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants