[WIP] Ephemeral Values prototype #35077

apparentlymart · 2024-04-24T19:24:29Z

This is another attempt at introducing to Terraform the idea of objects and values being "ephemeral", which means something like "lives only for the duration of one Terraform phase".

Terraform already has at least two concepts that meet this definition, despite us not previously naming it:

Provider configurations (provider blocks): Terraform re-evaluates the arguments in a provider block separately during the plan and apply phases, and doesn't mind if the configuration is different between the two as long as the apply-time configuration allows performing the actions that were proposed during the plan phase.
Provisioners (provisioner and connection blocks): Terraform fully evaluates these only during the apply phase, so they aren't really considered during the plan phase at all, aside from basic static validation.

However, because the idea of "ephemeral" is not available in the rest of the language, it's tough to actually benefit from this ephemeral. This prototype aims to introduce "ephemeral" as a cross-cutting concern supported broadly across the language.

Ephemeral Values

The most fundamental idea is that values used in expressions can either be ephemeral or non-ephemeral. This is an idea similar to "sensitive" in that Terraform will perform dynamic analysis such that any value derived from an ephemeral value is itself ephemeral. Ephemeral values can then be used only in parts of the language which would not require persisting the value either between the plan phase and the apply phase, or from one plan/apply round to the next.

Considering only pre-existing language features, ephemeral values can be freely used in provider blocks, provisioner blocks, connection blocks, and in local values. The following sections describe some new additions that either accept or produce ephemeral values.

resource blocks (aside from special nested parts like the aforementioned provisioner blocks) do not accept ephemeral values, because preserving resource configuration unchanged between the plan and apply phases is a fundamental part of how Terraform works to keep its promise of either doing what the plan described or returning an error explaining why that's not possible.

Because ephemeral values are not expected to persist from plan to apply or between plan/apply rounds, there is no need to save them in saved plan files or state snapshots, thus finally giving a plausible answer for what to do about #516, which has been on my mind since long before I worked at HashiCorp.

Ephemeral Input Variables

An ephemeral input variable is, in the most general terms, just an input variable that is declared as accepting ephemeral values. A non-ephemeral input variable cannot accept ephemeral values, while an ephemeral value will accept both ephemeral and non-ephemeral values but the value will always be treated as ephemeral when used inside the declaring module.

The main interesting case is when a root module declares an ephemeral input variable. In that case, Terraform will no longer remember the value for the variable provided during planning and will instead expect any ephemeral variable set during the plan step to be provided again -- possibly with a different value -- during the apply step.

The primary goal of this is to be able to use input variables to set arguments in ephemeral contexts. For example, an input variable that's both ephemeral and sensitive could provide a JSON Web Token to be used when configuring a specific provider, and then automation around Terraform could provide separate JSON Web Tokens across the plan and apply phases so that the apply phase isn't subject to the expiration time for the plan-time JWT, and so that the plan-time JWT doesn't get persisted to disk as part of a saved plan.

Ephemeral Output Values

An ephemeral output value is essentially the opposite of an ephemeral input variable, allowing a module to expose an ephemeral value to its caller. As with input variables, a non-ephemeral output value will reject having an ephemeral value assigned to it. An ephemeral output value can have both ephemeral and non-ephemeral values assigned to it, but the calling module will always see it as ephemeral.

To start the utility of this is limited just to echoing back values derived from ephemeral input variables, since nothing else I've described so far actually produces ephemeral values. However, allowing this is important to ensure that ephemeral values are supported symmetrically and will cooperate well with all other language features.

Ephemeral Resources

The final idea in this prototype -- one which this prototype probably won't explore fully just yet, and introduce only just enough to validate that it fits in well with everything else -- is a new resource mode for representing remote objects that are ephemeral themselves.

Terraform currently has two "resource modes": managed resources (resource blocks) describe objects that Terraform is directly managing, while data resources (data blocks) describe objects that are managed elsewhere that the current configuration depends on. But in both cases the assumption is that those objects persist in some sense from plan to apply and from one plan/apply round to the next, and that Terraform is supposed to detect and react to any changes to those objects and therefore needs to persist information about them itself.

Ephemeral resources, (ephemeral blocks) on the other hand, represent objects that -- at least, as far as Terraform is concerned -- exist only briefly during a single Terraform phase, and then get cleaned up once the phase is complete. This idea is an evolution of some much earlier design work I did before I even worked at HashiCorp 😀 in relation to #8367, which was about establishing temporary SSH tunnels, and the HashiCorp Vault provider I wrote in #9158 (which evolved into today's official hashicorp/vault).

The general idea of ephemeral resources, then, is that their lifecycle includes three events:

OpenEphemeral: Prepares the object for use. For some kinds of objects this would represent a "create" action, but for others it might just open a temporary session to something that already exists, such as in the SSH tunnel use-case.

This operation is the one that establishes the result attributes that can be accessed from other parts of the module where the resource is declared. All of these results would be ephemeral values, so that they can vary from plan to apply. For example, opening an SSH tunnel is likely to cause a different local TCP port number to be allocated each time, and so consistency between plan and apply phases is not expected.
RenewEphemeral: Some ephemeral remote objects need to be periodically refreshed in order to stay "live", such as leases for Vault secrets.

This optional operation is therefore opted into by the provider's OpenEphemeral response, by providing a private set of data that should be sent back to the provider's RenewEphemeral implementation and a deadline before which Terraform must renew it. The provider can then do whatever is needed to keep the object from expiring, and optionally return another renew request with a new deadline in order to repeat this renewal process.
CloseEphemeral: Once Terraform has completed work for all objects that refer to the ephemeral resource, this operation gives the provider an explicit signal that the object is not longer required so that it can be promptly destroyed or invalidated.

This detail is particularly helpful for the Vault provider and fixes a limitation I ran into immediately back in 2016: a dynamic secret fetched using a data block can never have its lease explicitly terminated, because data resources were intended only to read information about an object someone else is managing, not to directly manage an object (a Vault lease).

Because the results from ephemeral resources are ephemeral values, they're primarily useful in configuration for other ephemeral objects: provider blocks, provisioner/connection blocks, and of course other ephemeral blocks.

Actually changing the provider protocol and implementing real providers is not in scope for my initial prototyping work here, and so I intend to prototype this in a more limited way that just emulates how this mechanism might behave, so we can see how well it interacts with the rest of the language and the other ephemeral values discussed here.

I've also been considering a mechanism to allow managed resource types to declare individual arguments as being "write-only", such as for an RDS database password that only needs to be provided during creation and should not be provided again unless the operator actually intends to reset it. I don't intend to prototype that in here, but I intend to lay the foundations for it by having a convention that ephemeral input values and write-only arguments both treat null as meaning "don't set or change" and non-null as "set or change", thereby creating a small imperative-shaped niche in the otherwise-declarative Terraform Language to allow for using Terraform to manage objects that have write-only (typically, sensitive) arguments without needing to persist them in plan and state.

I'm still working on this, so not everything described above is in here yet, but the foundations for ephemeral values themselves are already in. I've opened this draft largely just because I need to put this work down for a while for a team offsite and don't want to lose the context.

For any request that can occur during the planning phase there is a chance that either a resource configuration or its associated provider configuration will contain unknown values that are placeholders for results of operations that haven't yet completed. Ideally a provider would be able to just do its best to predict the outcome in spite of the partial information, but in practice that isn't always possible. In those more complex situations it's better to let the provider explicitly decline to complete the operation and have Terraform Core defer it for a future run when there's hopefully more information available due to having applied other changes upstream. This commit does not yet introduce the idea of "deferred changes" into Terraform Core, so as a temporary step Terraform Core will just return an error if a provider tries to defer anything. In future commits we'll teach Terraform Core how to handle this more gracefully by saving partial results into the plan as "deferred changes" and then continuing on to downstream resources to try to gather as much information as possible to help the user understand the likely effects of those deferred actions.

This represents the two address types that could potentially have deferred actions associated with them during a Terraform plan operation, because deferring can happen either before or after instance expansion.

Previously we just immediately bailed out with an error if either count or for_each were not sufficiently known to determine their full set of instance keys. The Expander abstraction can now talk about module calls and resources having unknown expansion, so Terraform Core should tolerate that situation and just let the expander know that the expansion is unknown, and then we'll deal with that situation downstream. For now "downstream" actually means directly after these functions return, because the rest of Terraform Core isn't yet ready to deal with objects that don't know their full expansions. We'll just return errors similar to (but slightly lower quality than) the ones we used to return during evaluation, as a temporary placeholder to keep things working until we get downstream more ready to deal with this. While working on this I also noticed that we were redundantly re-evaluating the for_each expressions for each resource instance just to prepare the repetition data, which is unnecessary because the Expander abstraction already keeps track of that to ensure that all of the graph nodes have a consistent view of the expansions. We'll now just ask the expander directly what our RepetitionData should be, since that's part of the expander's responsibility.

Traversing upward from a PartialExpandedModule is trickier than traversing down because we need to deal with what happens if the traversal crosses over the boundary from partial-expanded into fully-expanded. To deal with that we end up having two different methods to handle the two situations, and a third method to indicate which one to call. Thankfully the need to ask for the parent of a partial-expanded module is relatively rare -- mainly just for input variables whose definitions need to eval in the parent module's scope -- so this awkward API shouldn't be needed in two many places.

This is mainly just a proof-of-concept of what it might look like to generate graph nodes representing placeholders for objects in not-fully-expanded modules. These new codepaths are not really accessible yet because it's still invalid to have a module whose expansion is unknown; we'll continue down this path further in later commits once there's actually somewhere to save the partially-evaluated placeholder values.

Our evaluation strategy for module-namespaced objects unfortunately depends quite strongly on having the right EvalContext in scope for each graph node, referring to the appropriate namespace in which to evaluate expressions. Although I was pretty reluctant to integrate the idea of partial-expanded module paths at quite this low a level, it does seem like the most pragmatic answer since it works with rather than against the existing evaluation strategies. As of this commit this isn't really doing anything because it isn't possible to reach any graph node that has a partial-expanded path and the EvalContext itself doesn't actually properly support evaluation in a partial-expanded path anyway; we'll fix up the rest of this in later commits before making these codepaths reachable.

This replaces the direct manipulation of a map shared between three different components, encapsulating that manipulation now inside a single wrapping API that itself ensures safe concurrent access. In future commits we'll do the same for local values and output values, but for now those part of namedvals.State remain unused.

Now that we have the separate namedvals.State type to encapsulate all of the named-value tracking we can simplify the EvalContext API to just return that object directly. This removes the slightly odd evolved API for setting and retrieving input variable values, instead now just calling directly into the relevant namedvals.State methods. It also slightly simplifies some of our test code because there's no longer any need to mock accesses to what is just a temporary in-memory data store anyway. Finally, this now gives nodePartialExpandedModuleVariable somewhere to save its placeholder values, though there's not yet anything to read them.

This is a new mode for the evaluator where instead of returning information about exact objects it'll return placeholder values that represent potentially many different hypothetical objects all declared from the same static configuration object, in situations where we don't yet have enough information to expand all of the modules and their contents. So far only the GetInputVariable function actually knows how to deal with this, so this is far from sufficient but is a reasonable starting point just to establish that it's possible to get Terraform into this evaluation mode when working with graph nodes that represent such placeholder objects.

Back when we added local values (a long time ago now!) we put their results in state mainly just because it was the only suitable shared data structure to keep them in. They are a bit ideosyncratic there because we intentionally discard them when serializing state to a snapshot, and that's just fine because they never need to be retained between runs anyway. We now have namedvals.State for all of our named value result storage needs, so we can remove the local-value-related fields of states.Module and use the relevant map inside the local value state instead.

For any local value declared beneath a module call whose expansion isn't known yet, we'll calculate a single value to serve as a placeholder for all possible valid instances of that local value, using unknown values in any situation where a value might differ between instances.

For a very long time we've had an annoying discrepancy between the in-memory state model and our state snapshot format where the in-memory format stores output values for all modules whereas the snapshot format only tracks the root module output values because those are all we actually need to preserve between runs. That design wart was a result of us using the state both as an internal and an external artifact, due to having nowhere else to store the transient values of non-root module output values while Terraform Core does its work. We now have namedvals.State to internally track all of the throwaway results from named values that don't need to persist between runs, so now we'll use that for our internal work instead and reserve the states.State model only for the data that we will preserve between runs in state snapshots. The namedvals internal model isn't really designed to support enumerating all of the output values for a particular module call, but our expression evaluator currently depends on being able to do that and so we have a temporary inefficient implementation of that which just scans the entire table of values as a stopgap just to avoid this commit growing even larger than it already is. In a future commit we'll rework the evaluator to support the PartialEval mode and at the same time move the responsiblity for enumerating all of the output values into the evaluator itself, since it should be able to determine what it's expecting by analyzing the configuration rather than just by trusting that earlier evaluation has completed correctly. Because our legacy state string serialization previously included output values for all modules, some of our context tests were accidentally depending on the implementation detail of how those got stored internally. Those tests are updated here to test only the data that is a real part of Terraform Core's result, by ensuring that the relevant data appears somewhere either in a root output value or in a resource attribute. As of this commit, what remains of the states.State model can now be entirely serialized by the state snapshot format, with no more situations where we just silently drop some data that Terraform Core uses as an implementation detail.

This will allow it to determine which instances _should_ be present rather than just trusting which instances _are_ present, which will make it harder to accidentally hide graph ordering bugs behind fallback behavior and, more importantly, will allow the evaluator to recognize the difference between there being no instances at all or the instance keys not yet being known.

This is a totally different approach to GetModule which uses the configuration and previously-registered expansion to determine what ought to exist in our named values state, rather than treating the values in the named values state as the source of truth. As a result we get an overall simpler implementation which is able to panic if other components aren't behaving correctly, and we can return placeholder results in partial evaluation mode, at least as long as we're working with a single-instance module. There are some further opportunities for simplification and improving the detail of the unknown results if we make broader changes in future, but for the moment this is just enough to mimic the previous behavior using a new strategy.

This doesn't actually do anything useful yet, but at least stubs out how evaluation for these might work in later commits.

apparentlymart · 2024-04-24T19:33:31Z

Ugh whoops I selected the deferred actions branch instead of the ephemeral values branch 🤦‍♂️

I'll figure out how to fix that later, but for now the real branch is f-ephemeral-values.

apparentlymart · 2024-04-24T19:34:38Z

It seems like it isn't possible to change the source branch of a PR once it's created, so I'm going to close this one and open a fresh one with the same text but the correct branch. 😖

Edit: The new PR is #35078

github-actions · 2024-05-25T02:03:34Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added 17 commits November 29, 2023 11:17

addrs: Deferrable address types

b5296a4

This represents the two address types that could potentially have deferred actions associated with them during a Terraform plan operation, because deferring can happen either before or after instance expansion.

plans/deferring: Some helpers to track deferred actions

0c595bf

addrs: InstanceKeyType.String method

0d994a5

core: Beginnings of placeholders for resources with unknown expansion

14f54c5

This doesn't actually do anything useful yet, but at least stubs out how evaluation for these might work in later commits.

apparentlymart added enhancement config labels Apr 24, 2024

apparentlymart self-assigned this Apr 24, 2024

apparentlymart closed this Apr 24, 2024

github-actions bot locked as resolved and limited conversation to collaborators May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Ephemeral Values prototype #35077

[WIP] Ephemeral Values prototype #35077

apparentlymart commented Apr 24, 2024 •

edited

apparentlymart commented Apr 24, 2024

apparentlymart commented Apr 24, 2024 •

edited

github-actions bot commented May 25, 2024

[WIP] Ephemeral Values prototype #35077

[WIP] Ephemeral Values prototype #35077

Conversation

apparentlymart commented Apr 24, 2024 • edited

Ephemeral Values

Ephemeral Input Variables

Ephemeral Output Values

Ephemeral Resources

apparentlymart commented Apr 24, 2024

apparentlymart commented Apr 24, 2024 • edited

github-actions bot commented May 25, 2024

apparentlymart commented Apr 24, 2024 •

edited

apparentlymart commented Apr 24, 2024 •

edited