Copy deferred data sources into the plan, and use deferred values in references/outputs #35182

nfagerlund · 2024-05-18T01:47:38Z

This PR has two authors and two goals — @liamcervante was already working on wiring deferred values into outputs, and I needed to build on some of what he'd already done in order to finish implementing deferral for data sources.

Data sources can cause deferred changes

Previously, we weren't copying any information about deferred data sources into the plan; the theory was that data sources are handled during plan, and any resource that refers to them will already be listed as deferred, so we can just reconstruct things during the apply as needed.

This turned out to be not quite coherent, mostly because data sources CAN in fact produce planned changes that get processed during an apply! To observe this, just make a data source reference a computed attribute from a resource that hasn't been applied yet; the data source will plan a Read action. This caused at least one bug with the prior approach, when a data source was both deferred due to a deferred prereq AND delayed due to unknown configuration -- it would plan a change that would blow up during apply.

So, since deferral is conceptually nearly the same thing as a delayed read (it just gets delayed til a future plan/apply cycle instead of the upcoming apply), we're now making deferred data sources produce a deferred Read change in the plan. This makes the storage formats for data sources and resources in the Deferred struct identical, which simplifies several things. It also lets us easily produce deferred graph nodes for data sources in the apply, which simplifies getting their incomplete values for use in outputs.

Outputs use deferred change values

When building the hcl eval context for evaluating expressions, we now use the deferred change planned value for resources and data sources if one is available. This generally seems to result in outputs getting appropriately null values for things that can't be known due to deferral.

Questions for reviewers

Is the new behavior of outputs coherent, in the case where a resource previously existed in the state but we've got a deferred planned change coming up? They're now returning the (partial or unknown) planned value, rather than the old fully-known (but soon-to-be invalidated) value. I went in circles on this.
In the new logic for partial-expanding data sources, do we need to do anything to handle data sources that are nested in check blocks? I didn't think so, but I'm also new to check blocks.
For @liamcervante: Could you expand a bit on the changes to transform_destroy_edge and node_resource_apply? This was part of the outputs stuff, and it's the one bit I didn't fully understand. I think it's ultimately because outputs might be referring to partially-expanded resources inside modules that have unknown for_each, but I couldn't fully connect the dots yet.

Target Release

1.9.x

Draft CHANGELOG entry

EXPERIMENTS

The deferred actions experiment now handles data sources and outputs more thoroughly.

liamcervante · 2024-05-21T07:30:43Z

internal/plans/deferring/deferred.go

@@ -201,6 +213,8 @@ func (d *Deferred) HaveAnyDeferrals() bool {
 // IsResourceInstanceDeferred returns true if the receiver knows some reason
 // why the resource instance with the given address should have its planned
 // action deferred for a future plan/apply round.
+// TODO NF: This is dead code, and we always ignore the instructions below
+// to use it before ShouldDeferResourceInstanceChanges.
 func (d *Deferred) IsResourceInstanceDeferred(addr addrs.AbsResourceInstance) bool {


Is there any reason to keep this around if nothing is referencing it? Can you delete it now?

Yeah, I think i can delete it. 👍🏼 I hesitated because it was referenced in the comment of another function, but since we've never heeded that comment, I think we're fine.

liamcervante

I think this looks good - I've left a couple of minor questions. I've also tagged in James for a question around the pruned node transformer since I'm not sure how important that optimisation actually is but I think we need to remove it.

liamcervante · 2024-05-21T07:32:49Z

internal/terraform/context_apply_deferred_test.go

+					// getting the new `name` value from the partially-planned deferred change rather than the
+					// old `name` value from the state. Is that right? For both cases? I couldn't quite
+					// decide, but currently that's what happens because of where the early-out happens over
+					// in GetResource while we're building the hcl eval context.
 					"a": cty.ObjectVal(map[string]cty.Value{


In the other tests we always return null values? Is it possible to do the same here? Or is this simply because there is something already in the state for this? Maybe we could return an unknown value here instead of null? To represent that something may be changing but we're not sure.

liamcervante · 2024-05-21T07:36:50Z

internal/terraform/node_resource_plan_partialexp.go

+		return &change, diags
+	}
+
+	// TODO NF: Do we need to do something here to handle partial expanded data


I think nested data sources don't support the for each attribute so I'd say no.

Oh hmm just realized they might be in a module that has an unknown for_each, so the possibility exists... but I re-read the special case handling and it's fine, because it's all around "when do we perform the read, and what do we do with the result?", none of which is actually relevant if you're deferred anyway. I'll update the comment with that.

liamcervante · 2024-05-21T07:42:37Z

internal/terraform/transform_destroy_edge.go

-							// root module, and so it's not actually important
-							// to expand it and so this lets us do a bit more
-							// pruning than we'd be able to do otherwise.
-							if tmp, ok := v.(graphNodeTemporaryValue); ok && !tmp.temporaryValue() {


@jbardin, how important is this optimisastion within the overall context of performance issues we've been facing?

Essentially since we can now refer to the expander during the apply phase in the evaluator (as outputs are being processed properly and can refer to deferred resources) we do need to expand resource nodes even if the only thing referring to them is a root output. This is simply because the expander must be aware of resources or it panics.

This optimisation meant that resource expansion nodes were being removed even though they were being referenced and we start getting panics. Since we want outputs to behave a specific way when referencing resources that are being deferred we do need to check for partial expansions during the apply.

@nfagerlund, hopefully this question for James adds some context to what is happening here!

The pruneUnusedNodesTransformer isn't an optimization, but rather helps handle the case of complex destroy scenarios (see #25018 where this transformer last had some major refactoring). The conversation around the block being removed may be interesting too starting here.

The gist of the process is that "temporary" values, i.e. ones not written to the state, are always evaluated again during apply, but because apply can also be destroying portions of the state (or portions are already missing entirely), the config may be inserting some nodes which cannot be evaluated because their dependencies have already been destroyed, and will otherwise error out if they are not pruned.

Maybe the failures that Martin had in the second link there shed some light on why this was needed. My hunch is that the case we still need to verify here is: a local value referenced by a root output, which references a resource that is already destroyed.

I wondered if a potential behaviour change here was to simply stop outputs from evaluating during destroy only plans? I think that was another potential workaround I had for this issue.

The same holds during apply too, but another problem is that you might not have a destroy-only plan, it could be a normal plan where portions of the configuration have been removed.

Would that not lead to a validation error? An output referring to a part of the configuration that has been removed?

that's what I get for commenting on the way out the door last night, and was only thinking about the resources. Yeah, any output would need to come from valid configuration, even during destroy. The reason they were always there in general is that originally all apply steps were supposed to be equal, with no knowledge of which command was being run. We've since accepted that applying a destroy plan is different and different actions need to be taken, so perhaps we can just remove the root outputs in that case.

@nfagerlund, based on this I think we need to reintroduce the check here that I deleted, and reintroduce the relevant test. This will then likely lead to some panics happening in some tests.

To fix that, we can come into https://github.com/hashicorp/terraform/blob/main/internal/terraform/node_output.go#L367 and check that if we're an output in the root module and n.DestroyApply is true then just return cty.NullValue(cty.DynamicPseudoType) for this output on the basis that this is a destroy action so the outputs are going to be null once everything is destroy anyway.

Feel free to ping me or anything if I'm not making any sense!

This took some digging to fully understand, but the eventual outcome is:

The core logic of this PR added new spots where we expect resource expand nodes to have properly executed before trying to reference values.

The prune step was violating that expectation by pruning expand nodes in apply graphs.

But pruning ONLY serves a useful purpose in destroy runs; there's genuinely no value to running it in a non-destroy context. We were only doing that in order to try and treat destroy and non-destroy the same, which we've now recognized isn't always viable.

So I added a commit to skip the entire prune transform in non-destroy applies.

(Also, pruning the expand nodes isn't a problem in destroys, because in that case we're not executing things like output preconditions, which was where the test failure arose.)

(NF:) This is the initial commit that Liam shared with me when I took over the project, modulo a few tweaks during rebase. - Store values for deferred data sources in the Deferred struct, so expressions have something to reference. (A later commit switches to storing a full Change instead.) - Two changes to `evaluationStateData.GetResource`: - For expressions evaluated in applies, return early with cty.DynamicVal for resources with unknown instance keys. (Previously only happened for evaluations during plan.) - If we happen to have a deferred change stored for an instance of this resource, fetch and use the deferred planned value. - Remove some early-returns/skips from `nodeExpandApplyableResource.Execute`, so that deferred resources get properly registered with the Deferred during apply (and thus have their values available to be referenced by outputs). - Add Execute methods and new struct fields for `nodeApplyableDeferredInstance / nodeApplyableDeferredPartialInstance`, to "rehydrate" deferred changes received from the Plan into properly registered (in the Deferred struct) deferrals during apply. (This connects the dots from the plan to the apply, so we don't have to recalculate deferrals from scratch.) - Do some more setup in DeferredTransformer to make `nodeApplyableDeferredInstance / nodeApplyableDeferredPartialInstance` work properly (fill new fields, connect edges to corresponding resource nodes).

…ances This brings data source deferrals and managed resource deferrals a little closer to symmetry -- we now store full deferred changes for individual deferred data source instances. This makes sense because a deferred data source is conceptually _very similar_ to a data source whose read gets delayed until the apply due to an as-yet-unmet dependency -- in both cases, we know we need to do something (read it), but aren't yet able to do so. The way to represent that is by emitting a planned change, Q.E.D. :) I think this should fix at least one existing bug: data sources that plan a Read change due to dependency on a deferred resource should no longer attempt to apply those changes on the same plan/apply cycle. However, I think we're not yet transferring those deferred changes into the `Plan`, and we still need to handle partial expansion more symmetrically.

This propagates deferred data source Read actions into the Plan struct, so they'll be available for inspection during the apply phase.

Fill in `nodePlannablePartialExpandedResource.dataResourceExecute`, and report `ResourceInstanceChange` values for them in Deferred.

...now that they're implemented.

…sources too Outputs will want these too.

At this point, expressions that refer to deferred resources or data sources end up resolving to the "planned change" value from the reported deferral, per the changes to the `GetResource` function from a few commits back. The fields of a resource body often end up being a null value of the appropriate type. Sometimes there's enough known about the plan to produce a real value, and sometimes (due to partial expansion) the object keys or tuple slots are unknown and an entire field or resource body has to become `cty.NullVal(cty.DynamicPseudoType)`. This commit updates the deferred actions tests to expect the appropriate output values based on this new behavior.

This fixes a bug that could cause panics if an output precondition tried to reference a resource or data source that did not have any planned changes queued up for the current apply. The proximal cause was that: - `evaluationStateData.GetResource` now does an early-exiting check for partial-expanded resources during applies (as well as plans), which results in a panic when referencing a resource that has never had its expander run. - Applies prune "unused" expander nodes. - We weren't properly treating the reference from the embedded `precondition` block of a root output as being a meaningful indication that an expander wasn't safe to prune. Could have fixed that by making the prune check for dependencies of expanders more complicated (I tried it and it worked), but it turned out the whole prune was unnecessary in the first place, so let's do the dumber and easier thing. 👍🏼 Understanding this took some work, so I made sure to document everything I learned from James for the next person to end up here. But the upshot is, we really _don't_ need or want to be running this transformer during non-destroy applies. As a consequence of the tighter targeting, we can also remove the case that preserves check blocks -- this was only necessary to avoid breaking non-destroy applies, since check nodes don't do anything during destroy runs anyway.

When deferred actions are enabled, outputs can end up with an unknown value at the end of an apply if they reference a resource or data source that got deferred to a future run. (It looks like outputs can already have unknown values at this point even without deferral, but I'm not sure what those scenarios are at the moment.) However, it's possible to get confusing results if the output is a structural type and only _some_ of its elements are unknown. (Consider a deferred resource with some computed attributes: we report a deferred change that includes concrete values for the attributes we control, and unknown values for anything computed. The evaluator will let the output get concrete values for anything we're sure we can predict, and unknowns for the rest.) This commit adds an extra normalization step for root output values. If the value contains _some_ unknown elements, we turn it into a wholly unknown value of the appropriate type before serializing to state. (We continue our existing practice of serializing unknown root output values as nulls for JSON.) This makes it easier to avoid mistaking the case of "this one attribute was known to have a null value" for the case of "some of the attributes were unknown". Note that outputs that reference deferred resources can still end up with concrete values, *IF* they only refer to uncomputed attributes with config-provided values for resource instances with known instance keys. You could make an argument either way for this, since the resource technically is in an "unready" (or nonexistent) state, but we do at least know what we're gonna set the provided attributes to.

liamcervante

Changes look good to me! Would be good to wait for James' thoughts on the pruning part but seems reasonable to me.

Only comment is around the setting to unknown and then setting to null directly after. I don't think the behaviour will change at all so entirely optional - just a simplification if anything.

Thanks!

liamcervante · 2024-06-07T07:28:52Z

internal/terraform/node_output.go

+			if val.IsKnown() && !val.IsWhollyKnown() {
+				// For the version of the value that ends up in state, treat
+				// partially-unknown values as wholly unknown. The goal is to avoid
+				// confusion with outputs that reference deferred values; since the
+				// referenced resource is still unready, we don't want to store
+				// planned-but-not-verified speculations in the source of truth.
+				val = cty.UnknownVal(val.Type())
+			}
 			val = cty.UnknownAsNull(val)


Suggested change

if val.IsKnown() && !val.IsWhollyKnown() {

// For the version of the value that ends up in state, treat

// partially-unknown values as wholly unknown. The goal is to avoid

// confusion with outputs that reference deferred values; since the

// referenced resource is still unready, we don't want to store

// planned-but-not-verified speculations in the source of truth.

val = cty.UnknownVal(val.Type())

}

val = cty.UnknownAsNull(val)

if val.IsKnown() && !val.IsWhollyKnown() {

// For the version of the value that ends up in state, treat

// partially-unknown values as wholly unknown. The goal is to avoid

// confusion with outputs that reference deferred values; since the

// referenced resource is still unready, we don't want to store

// planned-but-not-verified speculations in the source of truth.

val = cty.NullVal(val.Type())

} else {

val = cty.UnknownAsNull(val)

}

I think the follow up call to UnknownAsNull will just always replace the unknown value we just created with null? Which seems a bit wasteful when we can just set the thing to null directly. What do you think? Probably need to update the comment a bit to reflect we're not setting it to unknown any more.

jbardin

The pruneUnusedNodesTransformer change LGTM!

nfagerlund marked this pull request as ready for review May 18, 2024 01:47

liamcervante reviewed May 21, 2024

View reviewed changes

nfagerlund force-pushed the nf/may24-data-planned-deferred-changes branch from 60e06c2 to 1d251eb Compare May 22, 2024 21:59

vercel bot deployed to Preview May 22, 2024 22:05 View deployment

nfagerlund force-pushed the nf/may24-data-planned-deferred-changes branch from 1d251eb to 937e0f7 Compare May 31, 2024 12:53

vercel bot deployed to Preview May 31, 2024 12:58 View deployment

nfagerlund force-pushed the nf/may24-data-planned-deferred-changes branch from 937e0f7 to 773e7ea Compare June 4, 2024 09:41

vercel bot deployed to Preview June 4, 2024 09:47 View deployment

liamcervante and others added 12 commits June 6, 2024 22:21

Add deferred data source instance changes to GetDeferredChanges() output

31e9dff

This propagates deferred data source Read actions into the Plan struct, so they'll be available for inspection during the apply phase.

Finish implementing partial-expanded data sources

608e04e

Fill in `nodePlannablePartialExpandedResource.dataResourceExecute`, and report `ResourceInstanceChange` values for them in Deferred.

node_resource_apply_deferred: Fix unused diags var

fce9855

Include partial-expanded data sources in GetDeferredChanges()

e655d4f

...now that they're implemented.

nodeApplyableDeferredInstance: During apply, re-report deferred data …

ba839a9

…sources too Outputs will want these too.

Remove a dead function

2a134d6

Update some deferral tests with desired output behavior (so they fail)

cb0f44d

nfagerlund force-pushed the nf/may24-data-planned-deferred-changes branch from 773e7ea to 7f9cfc3 Compare June 6, 2024 21:26

vercel bot deployed to Preview June 6, 2024 21:32 View deployment

liamcervante approved these changes Jun 7, 2024

View reviewed changes

jbardin approved these changes Jun 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy deferred data sources into the plan, and use deferred values in references/outputs #35182

Copy deferred data sources into the plan, and use deferred values in references/outputs #35182

nfagerlund commented May 18, 2024 •

edited

liamcervante May 21, 2024

nfagerlund May 21, 2024

liamcervante left a comment

liamcervante May 21, 2024

liamcervante May 21, 2024

nfagerlund May 22, 2024

liamcervante May 21, 2024

liamcervante May 21, 2024

jbardin May 21, 2024

jbardin May 21, 2024

liamcervante May 21, 2024

jbardin May 22, 2024 •

edited

liamcervante May 22, 2024

jbardin May 22, 2024

liamcervante May 22, 2024

nfagerlund Jun 4, 2024 •

edited

liamcervante left a comment

liamcervante Jun 7, 2024

jbardin left a comment

Copy deferred data sources into the plan, and use deferred values in references/outputs #35182

Are you sure you want to change the base?

Copy deferred data sources into the plan, and use deferred values in references/outputs #35182

Conversation

nfagerlund commented May 18, 2024 • edited

Data sources can cause deferred changes

Outputs use deferred change values

Questions for reviewers

Target Release

Draft CHANGELOG entry

EXPERIMENTS

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamcervante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbardin May 22, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfagerlund Jun 4, 2024 • edited

Choose a reason for hiding this comment

liamcervante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbardin left a comment

Choose a reason for hiding this comment

nfagerlund commented May 18, 2024 •

edited

jbardin May 22, 2024 •

edited

nfagerlund Jun 4, 2024 •

edited