Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store digest of latest image in ImagePolicy status #368

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

makkes
Copy link
Member

@makkes makkes commented Apr 5, 2023

API changes

In addition to the latest selected tag, ImagePolicy now allows to also record the digest of the tag in .status.latestRef.digest. The setting of this field is governed by the newly introduced field .spec.digestReflectionPolicy which takes one of the values Always, Never or IfNotPresent. See the updated documentation under docs/spec/ for details.

The new .status.latestRef.digest status field can be used to pin an image to an immutable descriptor rather than to a potentially moving tag, increasing the security of workloads deployed on a cluster.

The goal is to make use of the digest in IAC so that manifests can be updated with the actual image digest.

ImagePolicy now gathers all information about the selected image in the .status.latestRef field, i.e. .status.latestImage is now split up into .status.latestRef.{image,tag,digest}. The same is true for .status.observedPreviousImage for which the accompanying new field is called .status.observedPreviousRef. The existing fields are retained for backwards-compatibility.

Implementation notes

Since both, ImagePolicyReconciler and ImageRepositoryReconciler now need to interact with registries, the interaction code (primarily setAuthOptions) was moved into the ./internal/registry package and the AuthOptionsGetter type in it. An AuthOptionsGetter variable is created in main.go and injected into both reconcilers.

closes #87
refs #214

makkes pushed a commit to fluxcd/image-automation-controller that referenced this pull request Apr 5, 2023
Using the new `:digest` suffix in policy markers one can now instruct
IAC to store the digest of an image instead of the tag in the given
manifest.

This makes use of the the `.status.imageDigest` field introduced in
fluxcd/image-reflector-controller#368.

closes #165

Signed-off-by: Max Jonas Werner <mail@makk.es>
@makkes makkes marked this pull request as draft April 6, 2023 16:01
@makkes
Copy link
Member Author

makkes commented May 5, 2023

@darkowlzz I pushed a much more lightweight approach where the ImagePolicyReconciler does the digest fetching now so that only the digest for the latest tag is fetched.

@makkes makkes force-pushed the store-digests branch 3 times, most recently from 8b3335a to 0ffd955 Compare May 8, 2023 14:24
@makkes makkes marked this pull request as ready for review May 8, 2023 14:58
@makkes makkes self-assigned this May 8, 2023
@makkes makkes added the enhancement New feature or request label May 8, 2023
@makkes
Copy link
Member Author

makkes commented May 8, 2023

This is now ready for another round of reviews. I adapted the PR description to properly reflect the changes. /cc @stefanprodan @darkowlzz

makkes pushed a commit to fluxcd/image-automation-controller that referenced this pull request May 8, 2023
Using the new `:digest` suffix in policy markers one can now instruct
IAC to store the digest of an image instead of the tag in the given
manifest.

This makes use of the the `.status.imageDigest` field introduced in
fluxcd/image-reflector-controller#368.

closes #165

Signed-off-by: Max Jonas Werner <mail@makk.es>
makkes pushed a commit to fluxcd/image-automation-controller that referenced this pull request May 8, 2023
Using the new `:digest` suffix in policy markers one can now instruct
IAC to store the digest of an image instead of the tag in the given
manifest.

This makes use of the the `.status.imageDigest` field introduced in
fluxcd/image-reflector-controller#368.

closes #165

Signed-off-by: Max Jonas Werner <mail@makk.es>
makkes pushed a commit to fluxcd/image-automation-controller that referenced this pull request May 8, 2023
Using the new `:digest` suffix in policy markers one can now instruct
IAC to store the digest of an image instead of the tag in the given
manifest.

This makes use of the the `.status.imageDigest` field introduced in
fluxcd/image-reflector-controller#368.

closes #165

Signed-off-by: Max Jonas Werner <mail@makk.es>
makkes pushed a commit to fluxcd/image-automation-controller that referenced this pull request May 9, 2023
Using the new `:digest` suffix in policy markers one can now instruct
IAC to store the digest of an image instead of the tag in the given
manifest.

This makes use of the the `.status.imageDigest` field introduced in
fluxcd/image-reflector-controller#368.

closes #165

Signed-off-by: Max Jonas Werner <mail@makk.es>
docs/spec/v1beta2/imagepolicies.md Outdated Show resolved Hide resolved
internal/registry/helper.go Show resolved Hide resolved
internal/registry/helper_test.go Outdated Show resolved Hide resolved
internal/registry/helper.go Outdated Show resolved Hide resolved
internal/registry/helper.go Outdated Show resolved Hide resolved
internal/registry/helper.go Outdated Show resolved Hide resolved
internal/registry/options.go Show resolved Hide resolved
internal/registry/options.go Outdated Show resolved Hide resolved
internal/controllers/imagepolicy_controller.go Outdated Show resolved Hide resolved
@makkes makkes added the blocked/requires-rfc Requires a design proposal label May 12, 2023
@makkes
Copy link
Member Author

makkes commented Sep 15, 2023

BTW, I ran GCP integration test against this branch to make sure login continues to work https://github.com/fluxcd/image-reflector-controller/actions/runs/6090482584/job/16525378616 . The old integration tests only check if ImageRepository was able to fetch tags. We can extend the test to also create an ImagePolicy and check if it fetches the digest to verify that it's able to authenticate.

Good point. I created a separate issue for this.

@makkes
Copy link
Member Author

makkes commented Sep 15, 2023

@darkowlzz all comments are addressed. Please take a look.

Max Jonas Werner added 14 commits September 20, 2023 14:52
The new API field `.status.latestDigest` in the `ImagePolicy` kind
stores the digest of the image referred to by the the
`.status.latestImage` field.

The setting of this field is governed by the newly introduced field
`.spec.digestReflectionPolicy` which takes one of the values `Always`
or `IfNotPresent`. See the updated documentation under `docs/spec/`
for details.

The new status field can be used to pin an image to an immutable
descriptor rather than to a potentially moving tag, increasing the
security of workloads deployed on a cluster.

The goal is to make use of the digest in IAC so that manifests can be
updated with the actual image digest.

Signed-off-by: Max Jonas Werner <mail@makk.es>
This new field summarizes all data reflecting an image reference, i.e.
the repository name, tag and digest.

Since this change changes the API in a backwards-incompatible way, the
new API version v1beta3 is introduced.

Signed-off-by: Max Jonas Werner <mail@makk.es>
This way we circumvent issues with server-side apply so that users can
explicitly change this field instead of having to remove it. The
latter case might lead to the API server not removing it if another
field manager is registered for that field, causing an unintended
drift.

This commit also aligns the v1beta3 API with the latest changes done
in v1beta2.

Signed-off-by: Max Jonas Werner <mail@makk.es>
We agreed to make the changes in the existing v1beta2 API version.

Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
.spec.image has no relevance in the given package, anymore.

Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
These must have been leftovers from previous iterations of this test.

Signed-off-by: Max Jonas Werner <mail@makk.es>
1. Default digestReflectionPolicy to "Never" and add a getter. With
   the getter method we will never encounter an empty policy even if
   defaulting hasn't taken place.
2. Make status.latestRef a pointer to align with
   status.observedPreviousRef. Having both fields be pointers makes it
   easier to use them in code so we only have to compare to nil and
   not the zero value.

Signed-off-by: Max Jonas Werner <mail@makk.es>
The field hasn't been set properly before. Correct behaviour is backed
by associated unit tests.

Signed-off-by: Max Jonas Werner <mail@makk.es>
The updated documentation has gotten lost due to the back and forth
with v1beta3.

Signed-off-by: Max Jonas Werner <mail@makk.es>
@@ -42,8 +42,26 @@ type ImagePolicySpec struct {
// ordered and compared.
// +optional
FilterTags *TagFilter `json:"filterTags,omitempty"`
// DigestReflectionPolicy governs the setting of the `.status.latestDigest` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have been

Suggested change
// DigestReflectionPolicy governs the setting of the `.status.latestDigest` field.
// DigestReflectionPolicy governs the setting of the `.status.latestRef` field.

but maybe more specifically .status.latestRef.digest as the description doesn't say anything about the digest. On that point, the description doesn't talk anything about the common/simple meaning of this field. Nor does the description of ReflectionPolicy, the underlying type, provides any relevant information about the tag's digest. It mentions "a value from the registry in a certain resource field".
Wouldn't it be better to describe this field independent of what's in the status, a more general meaning for those who may not be familiar with status or why someone would set this. It's about including the tag digest in the resulting latest image.

Copy link
Contributor

@darkowlzz darkowlzz Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through the latest changes, the concern about the definition of the field depending on the status is still not addressed.

Wouldn't it be better to describe this field independent of what's in the status, a more general meaning for those who may not be familiar with status or why someone would set this. It's about including the tag digest in the resulting latest image.

?

docs/spec/v1beta2/imagepolicies.md Show resolved Hide resolved
docs/spec/v1beta2/imagepolicies.md Show resolved Hide resolved
// If the old latest image and new latest image don't match, set the old
// image as the observed previous image.
// NOTE: The following allows the previous image to be set empty when
// there's a failure and a subsequent recovery from it. This behavior helps
// avoid creating an update event as there's no previous image to infer
// from. Recovery from a failure shouldn't result in an update event.
if oldObj.Status.LatestImage != obj.Status.LatestImage {
if oldObj.Status.LatestRef == nil ||
Copy link
Contributor

@darkowlzz darkowlzz Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of an edge case that occurs when updating to the new version with new fields. This check of latest ref to be nil results in the object to have an observed previous image (old field) and latest image (and latest ref as well) to be equal. This is because while migrating/updating the latest ref of old object is nil during migrating as it's a new field. The latest image (old field) still has the old data but since that's not checked here, it results in this scenario.

An example to reproduce it:
An ImagePolicy with an old version of the controller

spec:
  imageRepositoryRef:
    name: podinfo
  policy:
    semver:
      range: 5.0.x
status:
  conditions:
  - lastTransitionTime: "2023-09-20T22:15:50Z"
    message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' resolved to 5.0.3
    observedGeneration: 1
    reason: Succeeded
    status: "True"
    type: Ready
  latestImage: ghcr.io/stefanprodan/podinfo:5.0.3
  observedGeneration: 1

After updating the controller and API to the new version, the following is seen

spec:
  digestReflectionPolicy: Never
  imageRepositoryRef:
    name: podinfo
  policy:
    semver:
      range: 5.0.x
status:
  conditions:
  - lastTransitionTime: "2023-09-20T22:15:50Z"
    message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' resolved to 5.0.3
    observedGeneration: 1
    reason: Succeeded
    status: "True"
    type: Ready
  latestImage: ghcr.io/stefanprodan/podinfo:5.0.3
  latestRef:
    image: ghcr.io/stefanprodan/podinfo
    tag: 5.0.3
  observedGeneration: 1
  observedPreviousImage: ghcr.io/stefanprodan/podinfo:5.0.3

Note that there's no observed previous ref.

Calling it an "edge case" because this may not happen when the controller restarts, the database is empty, and ImagePolicy attempts to reconcile before ImageRepository populates the database. Failing ImagePolicy erases the latest image data completely and next time it succeeds, everything appears fine as if it's a new object and everything gets set properly. But logically, it's a bug which can be reproduced when running this locally (with make run) or controller data is backed by persistent storage. I was able to reproduce this without a PVC, which means it can happen in the usual deployments as well.

I think for proper update/migration, all the new fields should copy the value from the old fields first and later apply any mutation if relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it and added a test case.

Copy link
Contributor

@darkowlzz darkowlzz Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the fix and it doesn't look like it addressed the core issue as I pointed out above. Now it just ensures that the reconciler would not touch the previous observations if the latest ref of old object was empty. This only works for objects that didn't had the old observed previous image in the status. It doesn't work for objects that had observed previous image.
For example, before the migration, I had the following object status:

status:
  conditions:
  - lastTransitionTime: "2023-10-03T11:58:48Z"
    message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' updated from 5.0.3
      to 5.1.4
    observedGeneration: 2
    reason: Succeeded
    status: "True"
    type: Ready
  latestImage: ghcr.io/stefanprodan/podinfo:5.1.4
  observedGeneration: 2
  observedPreviousImage: ghcr.io/stefanprodan/podinfo:5.0.3

After migration:

status:
  conditions:
  - lastTransitionTime: "2023-10-03T11:59:13Z"
    message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' resolved to 5.1.4
    observedGeneration: 2
    reason: Succeeded
    status: "True"
    type: Ready
  latestImage: ghcr.io/stefanprodan/podinfo:5.1.4
  latestRef:
    image: ghcr.io/stefanprodan/podinfo
    tag: 5.1.4
  observedGeneration: 2
  observedPreviousImage: ghcr.io/stefanprodan/podinfo:5.0.3

The deprecated observedPreviousImage exists but not the new observedPreviousRef.

I still think that the proper fix is what I suggested in my previous comment

I think for proper update/migration, all the new fields should copy the value from the old fields first and later apply any mutation if relevant.

@darkowlzz
Copy link
Contributor

I couldn't comment about the notification message inline as that's not modified in the changes. The current code doesn't change the composeImagePolicyReadyMessage() and the reconciliation only takes the resultTag and previousTag into account, due to which there's never an alert or event when a new image reference is obtained due to new digest but same tag or a digest is introduced or removed, which effectively changes the resulting latest image. I think a change in digest is important to be notified to detect any unintended tag overwrites.


func (r *ImagePolicyReconciler) fetchDigest(ctx context.Context, repo *imagev1.ImageRepository, latest string, obj *imagev1.ImagePolicy) (string, error) {
ref := strings.Join([]string{repo.Spec.Image, latest}, ":")
tagRef, err := name.ParseReference(ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that ImagePolicy parses the image and makes an external request similar to ImageRepository, I think it should also handle basic sanity checks and prevent unnecessary retries when it's clear that retrying won't change anything.
When a bad image reference is provided, say ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa the ImageRepository enters stalled state:

spec:                                                                                                                                                                                         
  exclusionList:                                                                                                                                                             
  - ^.*\.sig$                                                                                                                                                                
  image: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa                                                                                                                    
  interval: 1h                                                                                                                                                                                
  provider: generic                                                                                                                                                                           
status:                                        
  canonicalImageName: ghcr.io/stefanprodan/podinfo                                             
  conditions:                                  
  - lastTransitionTime: "2023-09-21T01:10:56Z"                                                 
    message: 'could not parse reference: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa'                                                                                                    
    observedGeneration: 17                     
    reason: ImageURLInvalid                                                                    
    status: "True"                             
    type: Stalled                              
  - lastTransitionTime: "2023-09-21T01:10:56Z"                                                 
    message: 'could not parse reference: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa'                                                                                                    
    observedGeneration: 17                                                                                                                                                                    
    reason: ImageURLInvalid                                                                    
    status: "False"                                                                            
    type: Ready                                                                                                                                                                               
  lastScanResult:                                                                                                                                                                             
    latestTags:                                                                                
    - latest                                                                                                                                                                                  
    - 6.4.1
    ...
    scanTime: "2023-09-21T01:06:09Z"           
    tagCount: 46                               
  observedExclusionList:                       
  - ^.*\.sig$                                  
  observedGeneration: 17                       

But the associated ImagePolicy fails and enters a retry loop

spec:                                          
  digestReflectionPolicy: IfNotPresent         
  imageRepositoryRef:                          
    name: podinfo                              
  policy:                                      
    alphabetical:                              
      order: asc                               
status:                                        
  conditions:                                  
  - lastTransitionTime: "2023-09-21T01:10:56Z"                                                 
    message: reconciliation in progress        
    observedGeneration: 12                     
    reason: ProgressingWithRetry               
    status: "True"                             
    type: Reconciling                          
  - lastTransitionTime: "2023-09-21T01:10:56Z"                                                 
    message: 'failed fetching digest of ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest:                                                                                              
      failed parsing reference "ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest":                                                                                                     
      could not parse reference: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest'                                                                                                     
    observedGeneration: 12                     
    reason: Failed                             
    status: "False"                            
    type: Ready                                
  latestImage: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest                         
  latestRef:                                   
    image: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa                                    
    tag: latest                                
  observedGeneration: 12                       
  observedPreviousImage: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest               
  observedPreviousRef:                         
    image: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa                                    
    tag: latest                                

With logs:

{"level":"error","ts":"2023-09-21T01:10:56.776Z","msg":"Reconciler error","controller":"imagepolicy","controllerGroup":"image.toolkit.fluxcd.io","controllerKind":"ImagePolicy","ImagePolicy":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"7d5c92dc-aded-4bdc-a8b1-e7fb036e56a9","error":"failed fetching digest of ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest: failed parsing reference \"ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest\": could not parse reference: ghcr.io/stefanprodan/podinfo/foo:bar:zzz:qqq/aaa:latest"}

ImagePolicy doesn't check readiness of ImageRepository because it should be able to provide latest image based on the last valid configuration.

This behavior also wrongly sets the latest and observed previous ref of ImagePolicy to the bad image. I think it should only set valid values and avoid reconciling failing configurations so that the status will contain the last valid latest image even if the current configuration is failing.

It may be better to parse the image early in the ImagePolicy reconciler if digest is enabled and based on the that stall if the image reference is invalid. When the ImageRepository will be updated with good image reference, it'll trigger ImagePolicy to reconcile again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue that's not introduced through this PR so I've created a separate GH issue for it: #449

Copy link
Contributor

@darkowlzz darkowlzz Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#449 doesn't seem to be a bug/issue. That's how flux controllers work. Objects that have dependency on other objects usually take the last successful result. If a source object starts failing, kustomization will not fail, but use the last successful artifact from the source. This is why I mentioned

ImagePolicy doesn't check readiness of ImageRepository because it should be able to provide latest image based on the last valid configuration.

above. This is the reasoning behind the behavior, not a bug.

In my above comment, I'm describing a new issue introduced by this change because ImagePolicy reconciler now parses the image. This is a new behavior introduced by this change. Reading my comment again, I think the explanation is clear enough. Because of parsing image, ImagePolicy now enters into a failure state due to invalid image. I can explain again in some other way if the issue is really not clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#449 doesn't seem to be a bug/issue. That's how flux controllers work.

It is a bug. Look at the .status.latestImage field value in the issue description. It's using the broken image reference and reports success on it. This is wrong (because obviously the tag hasn't been derived from that specific image reference) and misleading (because it doesn't tell the user that the image reference is actually wrong).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bug, but that's not what #449 describes. It says that the condition should be not ready, but instead, it's a bug in where the image is read from. Bug and the issue title or description don't match.
Since the ImageRepository doesn't save the observed image, we can't reliably provide the correct image to ImagePolicy. The canonical image name still points to the right image, but that's not the same as the input image. We can either add a new field in ImageRepository, say observedImage, that's updated only on successful reconciliation, or ImagePolicy reconciler should update the latest image only when ImageRepository is ready and skip reconciliation otherwise. I think a new observed image field would be better. This has to be addressed separately, out of scope of this change.

The above issue is not what I've tried to point out in this thread. The situation of getting into a failure retry loop is still due to a change introduced in this change and is still in scope of this change.

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides @darkowlzz his remarks, I see no further blockers.

Max Jonas Werner added 3 commits October 2, 2023 18:50
Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
Signed-off-by: Max Jonas Werner <mail@makk.es>
@darkowlzz
Copy link
Contributor

While working on static HelmRepository OCI, Alerts and Providers support in CLI, I made some changes in fluxcd/flux2#4298 that allows to conditionally make objects reconciliable. I think we can use the same for ImagePolicy to allow reconciliation depending on the digest reflection policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supply digest of images that are selected by policy
8 participants