New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move init phase of KPR configuration outside of Hive Runtime #32216
base: main
Are you sure you want to change the base?
Move init phase of KPR configuration outside of Hive Runtime #32216
Conversation
/test |
e8ce697
to
d490771
Compare
/test |
d490771
to
e77a166
Compare
/test |
e77a166
to
3a17272
Compare
When creating a new daemon with newDaemon, we perform various procedures to detect kpr config and subsequently override various config options, if necessary. Unlike most provided dependencies, newDaemon is created inside hive lifecycle and exposed as a dependeny inside a promise. This presents a problem when trying to write new modules that use the global *option.DaemonConfig. Although this type is exposed to hive via a Provide(...) call, some fields may change after the daemon has started so attempting to use the config during the populate phase can cause incorrect configuration to be used. Notably fields such as option.Config.EnableNodePort may change at runtime. Trying to solve this by forcing a dependency on the daemon promise fails in two ways: i. Modules that require a finalized *DaemonConfig while being initialized cannot wait for promise.Promise[*Daemon] to resolve as this would cause a deadlock waiting for a runtime dependency. ii. The map group provided using bpf.MapOut[T](...) has dependency on daemonPromise (these are synced before the loader dependency). So trying to depend on this would cause a Hive dependency cycle. Most of the kpr config init can be done prior to runtime, as finalizing configuration seems to be a reasonable thing to do in the Hive init (i.e. Populate(...)) phase. This moves all such work out of newDaemon and outside of hives lifecycle hooks while leaving kpr probing out of the init function and leaving it inside the newDaemon func. In order to all syncing on this procedure, the *option.DaemonConfig is re-provided as the alias *option.FinalDaemonConfig to allow dependencies Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
The MTU cell relies on option.Config.NodePortEnabled which as described in commit c2d65fe61a9231ee961097f894466727a2438768 is prone to a race where the daemon changes this configuration at runtime. This uses the newly introduced option.(*FinalDaemonConfig) type as a parameter dependency to prevent such issues with MTU. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
3a17272
to
34c0202
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tommyp1ckles Nice work Tom!
@@ -1356,6 +1356,14 @@ func LogRegisteredOptions(vp *viper.Viper, entry *logrus.Entry) { | |||
} | |||
} | |||
|
|||
// FinalDaemonConfig is a alias for DaemonConfig used to differentiate what stage of daemon option init | |||
// the config is when it is provided. | |||
type FinalDaemonConfig DaemonConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the intent for this. It's a parallel type and not an alias and it wraps an exported field. Is there an interface at play here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems more complicated than it needs to be. Could we instead just use the DaemonConfig
type and have the provider for it also do the KPR etc. init/validation? (daemon/cmd/cells.go:91).
I'm worried that things outside NewDaemon might also want to use this flag and they might not be ordered correctly if they don't use FinalDaemonConfig. So seems safer to just have one type whose provider correctly initializes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derailed The aliased type allows us to "re-provide" the same var (i.e. *option.DaemonConfig
) but in a later state of initialization. That is, we Provide DaemonConfig to hive for convenience but the underlying data is a global variable in pkg/option, this var is modified at various points at both init/runtime for things like overriding required config for KPR.
This presents the problem that this breaks assumptions about Hive dependency ordering, as well as having configuration that cannot be finalized prior to runtime.
So by providing the FinalDaemonConfig after the kpr init stuff, we provide a synchronization point to Hive where we know that the KPR init has already occured.
i.e.:
// Arbitrary ordering, KPR init may not be complete prior to this:
cell.Provide(func(_ *option.DaemonConfig) {})
// This depends on KPR init, so using fields mutated by that routine is "safe":
cell.Provide(func(_ *option.FinalDaemonConfig) {})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joamaki this was the approach I tried initially, albeit with a "pre-init" DaemonConfig alias, but since this is global we might be able to just use the global var there and then provide DaemonConfig as a "post-kpr-init" state as the primary dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Final" part of this is actually misleading because this is only "final" for the KPR related fields so that might cause confusion - so I think maybe reworking this makes sense.
If these approaches prove unweidly, it may make sense to just step back and come up with a plan for how to fix this in its entirety (i.e. get rid of the global *option.DaemonConfig
). In the meantime we could also solve this by having the KPR init provider just provide some token dependency (ex. postKPRInit{}
) where anything depending on that would be forced to wait for that to complete.
@@ -1356,6 +1356,14 @@ func LogRegisteredOptions(vp *viper.Viper, entry *logrus.Entry) { | |||
} | |||
} | |||
|
|||
// FinalDaemonConfig is a alias for DaemonConfig used to differentiate what stage of daemon option init | |||
// the config is when it is provided. | |||
type FinalDaemonConfig DaemonConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems more complicated than it needs to be. Could we instead just use the DaemonConfig
type and have the provider for it also do the KPR etc. init/validation? (daemon/cmd/cells.go:91).
I'm worried that things outside NewDaemon might also want to use this flag and they might not be ordered correctly if they don't use FinalDaemonConfig. So seems safer to just have one type whose provider correctly initializes it.
Please see commit message for detailed explanation.