Allowing callers of tentacle client to drive orchestration themselves #799

sburmanoctopus · 2024-02-09T02:18:18Z

Background

In the ExecutionsV2 pipeline, all tentacle communications are done asynchronously via events.

Results

Before

The TentacleClient.ExecuteScript method does the entire communication stack, from starting, to getting status, to completing etc.

After

We now expose the required operations needed to interact with Tentacle.

Considerations

Script Ticket

Tentacle V1 creates the script ticket on the Tentacle itself, whereas Tentacle V2 onwards are given the script ticket. This concept was hidden in the TentacleClient.

But now that we are driving the orchestration in Octopus Server, we need to expose this to the caller.

Orchestration vs Execution

We had to split apart the execution away from orchestration. This is because our new methods on the client are purely execution. But we want to make sure we are executing in the same way we do during orchestration.

Cancelling/Completing During Start

One thing we had to maintain when splitting execution and orchestration is that we try to cancel/complete in orchestration if we started attempting to start during execution.

This is why the ScriptExecutionCancelledAfterPotentiallyStartingException was introduced.

We also need to respect this contract in the ExecutionsV2 pipeline.

Using Latest Command/Response

The code became far simpler when the orchestrators and executors respected the contract that parameters and return values were always using the latest version (in this case V3Alpha).

This prevented the need for many generic arguments between layers.

It also simplified mapping.

ClientOperationMetricsBuilder

This PR does not have a good solution for the metrics we are currently gathering.

ClientOperationMetricsBuilder gets all the metrics about an operation, and reports back on that. But the operation is now split up into separate calls. So we cannot do this.

We need to revisit this at some point.

Testing

We should write our own set of tests on the new methods of the TentacleClient if this draft is considered worth progressing.

How to review this PR

This PR moves logic around so that it can be exposed. It would be great if we could please see if they are still logically equivalent.

Quality ✔️

Pre-requisites

I have read How we use GitHub Issues for help deciding when and where it's appropriate to make an issue.
I have considered informing or consulting the right people, according to the ownership map.
I have considered appropriate testing for my change.

sburmanoctopus · 2024-02-09T02:39:15Z

source/Octopus.Tentacle.Client/Scripts/IScriptOrchestratorFactory.cs

-using System.Threading;
-using System.Threading.Tasks;
-
-namespace Octopus.Tentacle.Client.Scripts


I didn't see the use of this.

LukeButters · 2024-02-11T21:49:26Z

source/Octopus.Tentacle.Contracts/ScriptServiceV3Alpha/GetStatusCommandV3Alpha.cs

+
+namespace Octopus.Tentacle.Contracts.ScriptServiceV3Alpha
+{
+    public class GetStatusCommandV3Alpha


Do these classes need to be tied to what is going over the wire, maybe it could just be GetStatusCommand no v3.

They don't need to be. I wasn't sure what convention we were following.

I guess it depends whether we want the versioning of all commands to increment together (with StartScriptCommandV3Alpha etc.), or whether they can progress at their own cadence.

I leant towards keeping them in sync, as we tend to make orchestrators (and now executors) for a specific version.

But if this is wrong, then I am happy to change it.

We probably need more guidance.

But in general the tentacle client should be working as hard as it can be for you. Which I think should me trying to hide away details best it can, including new different wire command/responses.

LukeButters · 2024-02-11T21:51:13Z

source/Octopus.Tentacle.Client/TentacleClient.cs

@@ -191,11 +193,11 @@ async Task<DataStream> DownloadFileAction(CancellationToken ct)
                    clientOptions,
                    logger);

-                var orchestrator = await factory.CreateOrchestrator(scriptExecutionCancellationToken);
+                var orchestrator = factory.CreateOrchestrator(scriptServiceVersion);


@sburmanoctopus Could we make it so that the "in memory" version of execute a scripts calls the below methods. That way the client itself gives an example of how to call the methods below and we always test those methods.

I don't think I follow this one sorry.

The below methods are the publicly facing versions, which will have their own issues with metrics gathering etc. So I would not be inclined to call them from other methods.

They do call the same executors, so end up calling the same code.

It would be nice to have something show you how they should be called. But that's all done within the orchestrator.

Maybe this one is worth a chat?

LukeButters · 2024-02-11T21:57:19Z

source/Octopus.Tentacle.Client/TentacleClient.cs

+            //TODO: sast - think about metrics. This isn't right, so we have ignored it for now. But we need to ask what we want to report on in an async event based world.
+            var operationMetricsBuilder = ClientOperationMetricsBuilder.Start();
+
+            var scriptServiceVersion = await DetermineScriptServiceVersionToUse(operationMetricsBuilder, logger, scriptExecutionCancellationToken);


Could this be cached within that second object that could be returned by start script. Otherwise we are very much relying on halibut caching to work. If we don't cache this we will need to ensure we don't log too much since DetermineScriptServiceVersionToUse might be logging to task log.

Yeah, good points 🤔

The complication is a multi-node environment.

Returning the version in start script would be the most efficient, but it does expose internal concerns to the public 😞

Like you say, caching does help, but it does mean every server node that does a task (e.g. GetStatus) now has to determine the version again (and log it to the task log 😑 ).

Which then brings us to the idea of reducing what we log when doing this.

@jimmyp , do you have any thoughts on this one?

very much relying on halibut caching to work

I dont understand halibut enough to understand this

ensure we don't log too much since DetermineScriptServiceVersionToUse might be logging to task log.

I'm wondering if this is something that might need to change about tentacle client. The way I've been thinking about task logging in the new pipeline is that its not something most components should know about. The pipeline will raise events and task logging component somewhere will build up the task log as a read model.

Returning the version in start script would be the most efficient, but it does expose internal concerns to the public 😞

What exactly is the internal concern being exposed here sorry? In general our multi-node caching mechanism is going to be the DeploymentExecution so my first thought is to reach for that if we do need caching. But I'm keen to here more about why we need caching here and what dependancies this would create

Is there anything written down anywhere of the goals and concerns tentacle client is supposed to deal with?

What exactly is the internal concern being exposed here sorry?

Before any RPC call, we determine the capabilities of the Tentacle (i.e. is it V1, V2 etc.) so we can use the correct interface. We cache the result of this so that when we keep doing multiple "get status" checks etc., we don't have to keep asking the same question.

Right now, this is all done within the Tentacle Client itself. So the caller does not need to care about it.

So the idea we are exploring is returning this information about the tentacle back to Octopus Server, so then Octopus Server can use it for future calls to Tentacle Client (and therefore Tentacle Client would not have to ask the Tentacle again).

But in doing so, we are exposing some info to achieve this.

However, recently, @gb-8 and I were exploring the idempotency issue with V1 tentacles, and realized that we may need to make some changes to support that. The main change is to actually expose the results of "Get Capabilities" so that the caller can make choices about how to drive Tentacle Client if it's a V1 tentacle.

If we do that, then we would already have exposed this info. So maybe it becomes moot.

Is there anything written down anywhere of the goals and concerns tentacle client is supposed to deal with?

Not that I'm aware of. But I wasn't around in the early days. @LukeButters might know if there is?

LukeButters · 2024-02-11T21:58:13Z

source/Octopus.Tentacle.Client/TentacleClient.cs

@@ -209,6 +211,136 @@ async Task<DataStream> DownloadFileAction(CancellationToken ct)
            }
        }

+
+        public async Task<ScriptStatusResponseV3Alpha> StartScript(


Would it be possible to return something that is not the type that is received over the wire.

Perhaps two things distinct objects should be returned here

The interesting data returned e.g. logs.

An object which can be passed to GetStatus containing everything required to do the next GetStatus call. This object could additionally have some extension methods for serialization and deserialization, which is all taken care of in the client (and tested in the client).

Yeah, I like that. It helps as an anti-corruption layer, plus it helps to define responsibilities.

Thinking this through, I'd say we create:

New class CommandContext (or a better name?)

Has ScriptTicket and NextLogSequence

New class CommandResponse (or a better name?)

Has State, ExitCode, and Logs (where Logs is still a list of ProcessOutput?)

A new property that holds the CommandContext for the next command. E.g. NextCommandContext?

Then we could use CommandContext as the parameter to the other commands, to really highlight this is what you need to use these.

nice, in the future it would be nice if the client can tell us if the script is hung (which is after the script has started and it has not logged output for more than some amount of time). Could we stash the information needed to do that in the command context?

Yeah, probably.

This sounds like a nice separate piece of work to come afterwards. I'll create a card for it.

nice, that hung stuff is something we almost did (2nd highest voted bet). Also I think it has to be in the client since it is not obvious when a script actually started :D

Done.

Also, story created

LukeButters · 2024-02-11T22:00:03Z

source/Octopus.Tentacle.Client/TentacleClient.cs

+            return result;
+        }
+
+        public async Task<ScriptStatusResponseV3Alpha> Finish(


Pulling Finish out makes sense since cleaning up is destructive and we do need to let callers do stuff like collect artifices before we clean up

LukeButters · 2024-02-12T21:05:57Z

source/Octopus.Tentacle.Client/TentacleClient.cs

+            var result = await executor.GetStatus(getStatusCommand, scriptExecutionCancellationToken);
+
+            return result;
+        }


is the caller of this method now responsible for delays between calls?

Yes 🙂

The current ExecutionsV2 pipeline does that already

LukeButters · 2024-02-13T00:47:21Z

I think in general this is rather good.

gb-8 · 2024-02-22T23:12:17Z

Summary of discussion of PR:

What the PR achieves:

It separates orchestration and execution logic inside tentacle client.
It adds new methods to allow callers to manage orchestration, leaving client responsible for execution.

What the PR does not address:

It does not allow the determination of what features the tentacle supports, so does not allow us to detect when we are dealing with a V1 tentacle which does not support idempotent script execution, and generates a new script ticket on the tentacle.

Benefits of getting this PR merged:

It does separate orchestration and execution concerns in TentacleClient and moves us towards being able to fully orchestrate tentacle interactions (with the exclusion of handling V1 tentacle limitations).
It would allow us to stop using FudgedTentacleClient and reduce filesystem usage that that is involves.

Cost of getting this PR merged:

Still requires further review before we can consider merging (It's had some attention from Luke).
It does not move us towards our cycle goal, and could distract from that.

Risks management:

The PR would need to be re-worked if we want to roll it out via a feature toggle.
The changes in the PR should not change behaviour for existing usages of TentacleClient, and this behaviour is well covered by automated tests.
New methods (for external orchestration) are not covered by automated tests to the same level.
Even if we wanted to address all required changes to tentacle client as a contiguous piece of work (including the missing tentacle version management concerns), we would likely want to split it up into smaller PRs, and this PR would be a reasonable first step.

… if they wish

pawelpabich · 2024-02-27T04:04:06Z

A few thoughts:

ITentacleClient is used by the classic execution pipeline and Dynamic Workers. Would it make sense to create a different interface for the resilient pipeline so you can freely experiment with it in isolation?
The Modern Deployment Team will prepare a mini RFC re how to expose the new K8s feature in the Tentacle Client. I don't think this should affect this PR and it is more of FYI.

APErebus · 2024-02-28T23:44:49Z

source/Octopus.Tentacle.Client/ITentacleClient.cs

 using Octopus.Tentacle.Contracts.ScriptServiceV3Alpha;

 namespace Octopus.Tentacle.Client
 {
    public interface ITentacleClient : IDisposable
    {
+        Task<CommandResponseV3Alpha> StartScript(StartScriptCommandV3Alpha startScriptCommand, ILog logger, CancellationToken scriptExecutionCancellationToken);


Further to @LukeButters point around service types leaking out of tentacle client, The StartScriptCommandV3Alpha also really needs to get changed to a non-service type.

In my mini-RFC (as mentioned by Pawel), part of the proposal is to remove service types leaking into the Server codebase.

I'm happy for you to leave it for now, but just be aware that this is something I'm intending to address as part of the proposed solution in the RFC

Sounds good. When you're back, maybe we should get together to sync our two streams of work so we get a good path forward 🙂

sburmanoctopus commented Feb 9, 2024

View reviewed changes

LukeButters reviewed Feb 11, 2024

View reviewed changes

LukeButters reviewed Feb 12, 2024

View reviewed changes

sburmanoctopus force-pushed the sast/SpikeOutExecutionsV2TentacleClientChanges branch 2 times, most recently from d685d53 to 6494980 Compare February 19, 2024 05:43

Allowing callers of tentacle client to drive orchestration themselves…

568d071

… if they wish

sburmanoctopus force-pushed the sast/SpikeOutExecutionsV2TentacleClientChanges branch from 6494980 to 568d071 Compare February 27, 2024 03:29

sburmanoctopus marked this pull request as ready for review February 27, 2024 03:29

sburmanoctopus requested a review from a team as a code owner February 27, 2024 03:29

sburmanoctopus requested a review from pawelpabich February 27, 2024 03:30

LukeButters requested a review from APErebus February 27, 2024 04:01

APErebus reviewed Feb 28, 2024

View reviewed changes

rhysparry mentioned this pull request Mar 15, 2024

Add v2 ExecuteScript overload to ITentacleClient #835

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing callers of tentacle client to drive orchestration themselves #799

Allowing callers of tentacle client to drive orchestration themselves #799

sburmanoctopus commented Feb 9, 2024 •

edited

sburmanoctopus Feb 9, 2024

LukeButters Feb 11, 2024

sburmanoctopus Feb 13, 2024

LukeButters Feb 13, 2024

LukeButters Feb 11, 2024

sburmanoctopus Feb 13, 2024

LukeButters Feb 11, 2024

sburmanoctopus Feb 13, 2024

jimmyp Mar 3, 2024 •

edited

jimmyp Mar 3, 2024

sburmanoctopus Mar 7, 2024

LukeButters Feb 11, 2024

sburmanoctopus Feb 13, 2024

sburmanoctopus Feb 13, 2024

LukeButters Feb 13, 2024

sburmanoctopus Feb 13, 2024

LukeButters Feb 13, 2024

sburmanoctopus Feb 14, 2024

LukeButters Feb 11, 2024

LukeButters Feb 12, 2024

sburmanoctopus Feb 13, 2024 •

edited

LukeButters commented Feb 13, 2024

gb-8 commented Feb 22, 2024

pawelpabich commented Feb 27, 2024

APErebus Feb 28, 2024

sburmanoctopus Mar 7, 2024

Allowing callers of tentacle client to drive orchestration themselves #799

Are you sure you want to change the base?

Allowing callers of tentacle client to drive orchestration themselves #799

Conversation

sburmanoctopus commented Feb 9, 2024 • edited

Background

Results

Before

After

Considerations

Script Ticket

Orchestration vs Execution

Cancelling/Completing During Start

Using Latest Command/Response

ClientOperationMetricsBuilder

Testing

How to review this PR

Pre-requisites

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimmyp Mar 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sburmanoctopus Feb 13, 2024 • edited

Choose a reason for hiding this comment

LukeButters commented Feb 13, 2024

gb-8 commented Feb 22, 2024

pawelpabich commented Feb 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sburmanoctopus commented Feb 9, 2024 •

edited

jimmyp Mar 3, 2024 •

edited

sburmanoctopus Feb 13, 2024 •

edited