Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy deferred data sources into the plan, and use deferred values in references/outputs #35182

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

nfagerlund
Copy link
Collaborator

@nfagerlund nfagerlund commented May 18, 2024

This PR has two authors and two goals — @liamcervante was already working on wiring deferred values into outputs, and I needed to build on some of what he'd already done in order to finish implementing deferral for data sources.

Data sources can cause deferred changes

Previously, we weren't copying any information about deferred data sources into the plan; the theory was that data sources are handled during plan, and any resource that refers to them will already be listed as deferred, so we can just reconstruct things during the apply as needed.

This turned out to be not quite coherent, mostly because data sources CAN in fact produce planned changes that get processed during an apply! To observe this, just make a data source reference a computed attribute from a resource that hasn't been applied yet; the data source will plan a Read action. This caused at least one bug with the prior approach, when a data source was both deferred due to a deferred prereq AND delayed due to unknown configuration -- it would plan a change that would blow up during apply.

So, since deferral is conceptually nearly the same thing as a delayed read (it just gets delayed til a future plan/apply cycle instead of the upcoming apply), we're now making deferred data sources produce a deferred Read change in the plan. This makes the storage formats for data sources and resources in the Deferred struct identical, which simplifies several things. It also lets us easily produce deferred graph nodes for data sources in the apply, which simplifies getting their incomplete values for use in outputs.

Outputs use deferred change values

When building the hcl eval context for evaluating expressions, we now use the deferred change planned value for resources and data sources if one is available. This generally seems to result in outputs getting appropriately null values for things that can't be known due to deferral.

Questions for reviewers

  • Is the new behavior of outputs coherent, in the case where a resource previously existed in the state but we've got a deferred planned change coming up? They're now returning the (partial or unknown) planned value, rather than the old fully-known (but soon-to-be invalidated) value. I went in circles on this.
  • In the new logic for partial-expanding data sources, do we need to do anything to handle data sources that are nested in check blocks? I didn't think so, but I'm also new to check blocks.
  • For @liamcervante: Could you expand a bit on the changes to transform_destroy_edge and node_resource_apply? This was part of the outputs stuff, and it's the one bit I didn't fully understand. I think it's ultimately because outputs might be referring to partially-expanded resources inside modules that have unknown for_each, but I couldn't fully connect the dots yet.

Target Release

1.9.x

Draft CHANGELOG entry

EXPERIMENTS

The deferred actions experiment now handles data sources and outputs more thoroughly.

@nfagerlund nfagerlund marked this pull request as ready for review May 18, 2024 01:47
@@ -201,6 +213,8 @@ func (d *Deferred) HaveAnyDeferrals() bool {
// IsResourceInstanceDeferred returns true if the receiver knows some reason
// why the resource instance with the given address should have its planned
// action deferred for a future plan/apply round.
// TODO NF: This is dead code, and we always ignore the instructions below
// to use it before ShouldDeferResourceInstanceChanges.
func (d *Deferred) IsResourceInstanceDeferred(addr addrs.AbsResourceInstance) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to keep this around if nothing is referencing it? Can you delete it now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think i can delete it. 👍🏼 I hesitated because it was referenced in the comment of another function, but since we've never heeded that comment, I think we're fine.

Copy link
Member

@liamcervante liamcervante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good - I've left a couple of minor questions. I've also tagged in James for a question around the pruned node transformer since I'm not sure how important that optimisation actually is but I think we need to remove it.

// getting the new `name` value from the partially-planned deferred change rather than the
// old `name` value from the state. Is that right? For both cases? I couldn't quite
// decide, but currently that's what happens because of where the early-out happens over
// in GetResource while we're building the hcl eval context.
"a": cty.ObjectVal(map[string]cty.Value{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other tests we always return null values? Is it possible to do the same here? Or is this simply because there is something already in the state for this? Maybe we could return an unknown value here instead of null? To represent that something may be changing but we're not sure.

return &change, diags
}

// TODO NF: Do we need to do something here to handle partial expanded data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nested data sources don't support the for each attribute so I'd say no.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm just realized they might be in a module that has an unknown for_each, so the possibility exists... but I re-read the special case handling and it's fine, because it's all around "when do we perform the read, and what do we do with the result?", none of which is actually relevant if you're deferred anyway. I'll update the comment with that.

// root module, and so it's not actually important
// to expand it and so this lets us do a bit more
// pruning than we'd be able to do otherwise.
if tmp, ok := v.(graphNodeTemporaryValue); ok && !tmp.temporaryValue() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbardin, how important is this optimisastion within the overall context of performance issues we've been facing?

Essentially since we can now refer to the expander during the apply phase in the evaluator (as outputs are being processed properly and can refer to deferred resources) we do need to expand resource nodes even if the only thing referring to them is a root output. This is simply because the expander must be aware of resources or it panics.

This optimisation meant that resource expansion nodes were being removed even though they were being referenced and we start getting panics. Since we want outputs to behave a specific way when referencing resources that are being deferred we do need to check for partial expansions during the apply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfagerlund, hopefully this question for James adds some context to what is happening here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pruneUnusedNodesTransformer isn't an optimization, but rather helps handle the case of complex destroy scenarios (see #25018 where this transformer last had some major refactoring). The conversation around the block being removed may be interesting too starting here.

The gist of the process is that "temporary" values, i.e. ones not written to the state, are always evaluated again during apply, but because apply can also be destroying portions of the state (or portions are already missing entirely), the config may be inserting some nodes which cannot be evaluated because their dependencies have already been destroyed, and will otherwise error out if they are not pruned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the failures that Martin had in the second link there shed some light on why this was needed. My hunch is that the case we still need to verify here is: a local value referenced by a root output, which references a resource that is already destroyed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered if a potential behaviour change here was to simply stop outputs from evaluating during destroy only plans? I think that was another potential workaround I had for this issue.

Copy link
Member

@jbardin jbardin May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same holds during apply too, but another problem is that you might not have a destroy-only plan, it could be a normal plan where portions of the configuration have been removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not lead to a validation error? An output referring to a part of the configuration that has been removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I get for commenting on the way out the door last night, and was only thinking about the resources. Yeah, any output would need to come from valid configuration, even during destroy. The reason they were always there in general is that originally all apply steps were supposed to be equal, with no knowledge of which command was being run. We've since accepted that applying a destroy plan is different and different actions need to be taken, so perhaps we can just remove the root outputs in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfagerlund, based on this I think we need to reintroduce the check here that I deleted, and reintroduce the relevant test. This will then likely lead to some panics happening in some tests.

To fix that, we can come into https://github.com/hashicorp/terraform/blob/main/internal/terraform/node_output.go#L367 and check that if we're an output in the root module and n.DestroyApply is true then just return cty.NullValue(cty.DynamicPseudoType) for this output on the basis that this is a destroy action so the outputs are going to be null once everything is destroy anyway.

Feel free to ping me or anything if I'm not making any sense!

Copy link
Collaborator Author

@nfagerlund nfagerlund Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took some digging to fully understand, but the eventual outcome is:

  • The core logic of this PR added new spots where we expect resource expand nodes to have properly executed before trying to reference values.
  • The prune step was violating that expectation by pruning expand nodes in apply graphs.
  • But pruning ONLY serves a useful purpose in destroy runs; there's genuinely no value to running it in a non-destroy context. We were only doing that in order to try and treat destroy and non-destroy the same, which we've now recognized isn't always viable.
  • So I added a commit to skip the entire prune transform in non-destroy applies.
  • (Also, pruning the expand nodes isn't a problem in destroys, because in that case we're not executing things like output preconditions, which was where the test failure arose.)

liamcervante and others added 12 commits June 6, 2024 22:21
(NF:) This is the initial commit that Liam shared with me when I took over the
project, modulo a few tweaks during rebase.

- Store values for deferred data sources in the Deferred struct, so expressions
have something to reference. (A later commit switches to storing a full Change
instead.)

- Two changes to `evaluationStateData.GetResource`:
    - For expressions evaluated in applies, return early with cty.DynamicVal for
    resources with unknown instance keys. (Previously only happened for
    evaluations during plan.)
    - If we happen to have a deferred change stored for an instance of this
    resource, fetch and use the deferred planned value.

- Remove some early-returns/skips from `nodeExpandApplyableResource.Execute`, so
that deferred resources get properly registered with the Deferred during apply
(and thus have their values available to be referenced by outputs).

- Add Execute methods and new struct fields for `nodeApplyableDeferredInstance /
nodeApplyableDeferredPartialInstance`, to "rehydrate" deferred changes received
from the Plan into properly registered (in the Deferred struct) deferrals during
apply. (This connects the dots from the plan to the apply, so we don't have to
recalculate deferrals from scratch.)

- Do some more setup in DeferredTransformer to make `nodeApplyableDeferredInstance /
nodeApplyableDeferredPartialInstance` work properly (fill new fields, connect
edges to corresponding resource nodes).
…ances

This brings data source deferrals and managed resource deferrals a little closer
to symmetry -- we now store full deferred changes for individual deferred data
source instances.

This makes sense because a deferred data source is conceptually _very similar_
to a data source whose read gets delayed until the apply due to an as-yet-unmet
dependency -- in both cases, we know we need to do something (read it), but
aren't yet able to do so. The way to represent that is by emitting a planned
change, Q.E.D. :)

I think this should fix at least one existing bug: data sources that plan a Read
change due to dependency on a deferred resource should no longer attempt to
apply those changes on the same plan/apply cycle.

However, I think we're not yet transferring those deferred changes into the
`Plan`, and we still need to handle partial expansion more symmetrically.
This propagates deferred data source Read actions into the Plan struct, so
they'll be available for inspection during the apply phase.
Fill in `nodePlannablePartialExpandedResource.dataResourceExecute`, and report
`ResourceInstanceChange` values for them in Deferred.
At this point, expressions that refer to deferred resources or data sources end
up resolving to the "planned change" value from the reported deferral, per the
changes to the `GetResource` function from a few commits back.

The fields of a resource body often end up being a null value of the appropriate
type. Sometimes there's enough known about the plan to produce a real value, and
sometimes (due to partial expansion) the object keys or tuple slots are unknown
and an entire field or resource body has to become
`cty.NullVal(cty.DynamicPseudoType)`.

This commit updates the deferred actions tests to expect the appropriate output
values based on this new behavior.
This fixes a bug that could cause panics if an output precondition tried to
reference a resource or data source that did not have any planned changes queued
up for the current apply. The proximal cause was that:

- `evaluationStateData.GetResource` now does an early-exiting check for
partial-expanded resources during applies (as well as plans), which results in a
panic when referencing a resource that has never had its expander run.
- Applies prune "unused" expander nodes.
- We weren't properly treating the reference from the embedded `precondition`
block of a root output as being a meaningful indication that an expander wasn't
safe to prune.

Could have fixed that by making the prune check for dependencies of expanders
more complicated (I tried it and it worked), but it turned out the whole prune
was unnecessary in the first place, so let's do the dumber and easier thing. 👍🏼

Understanding this took some work, so I made sure to document everything I
learned from James for the next person to end up here. But the upshot is, we
really _don't_ need or want to be running this transformer during non-destroy
applies.

As a consequence of the tighter targeting, we can also remove the case that
preserves check blocks -- this was only necessary to avoid breaking
non-destroy applies, since check nodes don't do anything during destroy runs
anyway.
When deferred actions are enabled, outputs can end up with an unknown value at
the end of an apply if they reference a resource or data source that got
deferred to a future run. (It looks like outputs can already have unknown values
at this point even without deferral, but I'm not sure what those scenarios are
at the moment.)

However, it's possible to get confusing results if the output is a structural
type and only _some_ of its elements are unknown. (Consider a deferred resource
with some computed attributes: we report a deferred change that includes
concrete values for the attributes we control, and unknown values for anything
computed. The evaluator will let the output get concrete values for anything
we're sure we can predict, and unknowns for the rest.)

This commit adds an extra normalization step for root output values. If the
value contains _some_ unknown elements, we turn it into a wholly unknown value
of the appropriate type before serializing to state. (We continue our existing
practice of serializing unknown root output values as nulls for JSON.) This
makes it easier to avoid mistaking the case of "this one attribute was known to
have a null value" for the case of "some of the attributes were unknown".

Note that outputs that reference deferred resources can still end up with
concrete values, *IF* they only refer to uncomputed attributes with
config-provided values for resource instances with known instance keys. You
could make an argument either way for this, since the resource technically is in
an "unready" (or nonexistent) state, but we do at least know what we're gonna
set the provided attributes to.
Copy link
Member

@liamcervante liamcervante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me! Would be good to wait for James' thoughts on the pruning part but seems reasonable to me.

Only comment is around the setting to unknown and then setting to null directly after. I don't think the behaviour will change at all so entirely optional - just a simplification if anything.

Thanks!

Comment on lines +784 to 792
if val.IsKnown() && !val.IsWhollyKnown() {
// For the version of the value that ends up in state, treat
// partially-unknown values as wholly unknown. The goal is to avoid
// confusion with outputs that reference deferred values; since the
// referenced resource is still unready, we don't want to store
// planned-but-not-verified speculations in the source of truth.
val = cty.UnknownVal(val.Type())
}
val = cty.UnknownAsNull(val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if val.IsKnown() && !val.IsWhollyKnown() {
// For the version of the value that ends up in state, treat
// partially-unknown values as wholly unknown. The goal is to avoid
// confusion with outputs that reference deferred values; since the
// referenced resource is still unready, we don't want to store
// planned-but-not-verified speculations in the source of truth.
val = cty.UnknownVal(val.Type())
}
val = cty.UnknownAsNull(val)
if val.IsKnown() && !val.IsWhollyKnown() {
// For the version of the value that ends up in state, treat
// partially-unknown values as wholly unknown. The goal is to avoid
// confusion with outputs that reference deferred values; since the
// referenced resource is still unready, we don't want to store
// planned-but-not-verified speculations in the source of truth.
val = cty.NullVal(val.Type())
} else {
val = cty.UnknownAsNull(val)
}

I think the follow up call to UnknownAsNull will just always replace the unknown value we just created with null? Which seems a bit wasteful when we can just set the thing to null directly. What do you think? Probably need to update the comment a bit to reflect we're not setting it to unknown any more.

Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pruneUnusedNodesTransformer change LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants