New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing callers of tentacle client to drive orchestration themselves #799
base: main
Are you sure you want to change the base?
Allowing callers of tentacle client to drive orchestration themselves #799
Conversation
using System.Threading; | ||
using System.Threading.Tasks; | ||
|
||
namespace Octopus.Tentacle.Client.Scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the use of this.
|
||
namespace Octopus.Tentacle.Contracts.ScriptServiceV3Alpha | ||
{ | ||
public class GetStatusCommandV3Alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these classes need to be tied to what is going over the wire, maybe it could just be GetStatusCommand
no v3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't need to be. I wasn't sure what convention we were following.
I guess it depends whether we want the versioning of all commands to increment together (with StartScriptCommandV3Alpha
etc.), or whether they can progress at their own cadence.
I leant towards keeping them in sync, as we tend to make orchestrators (and now executors) for a specific version.
But if this is wrong, then I am happy to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need more guidance.
But in general the tentacle client should be working as hard as it can be for you. Which I think should me trying to hide away details best it can, including new different wire command/responses.
@@ -191,11 +193,11 @@ async Task<DataStream> DownloadFileAction(CancellationToken ct) | |||
clientOptions, | |||
logger); | |||
|
|||
var orchestrator = await factory.CreateOrchestrator(scriptExecutionCancellationToken); | |||
var orchestrator = factory.CreateOrchestrator(scriptServiceVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sburmanoctopus Could we make it so that the "in memory" version of execute a scripts calls the below methods. That way the client itself gives an example of how to call the methods below and we always test those methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow this one sorry.
The below methods are the publicly facing versions, which will have their own issues with metrics gathering etc. So I would not be inclined to call them from other methods.
They do call the same executors, so end up calling the same code.
It would be nice to have something show you how they should be called. But that's all done within the orchestrator.
Maybe this one is worth a chat?
//TODO: sast - think about metrics. This isn't right, so we have ignored it for now. But we need to ask what we want to report on in an async event based world. | ||
var operationMetricsBuilder = ClientOperationMetricsBuilder.Start(); | ||
|
||
var scriptServiceVersion = await DetermineScriptServiceVersionToUse(operationMetricsBuilder, logger, scriptExecutionCancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be cached within that second object that could be returned by start script. Otherwise we are very much relying on halibut caching to work. If we don't cache this we will need to ensure we don't log too much since DetermineScriptServiceVersionToUse
might be logging to task log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good points 🤔
The complication is a multi-node environment.
Returning the version in start script would be the most efficient, but it does expose internal concerns to the public 😞
Like you say, caching does help, but it does mean every server node that does a task (e.g. GetStatus) now has to determine the version again (and log it to the task log 😑 ).
Which then brings us to the idea of reducing what we log when doing this.
@jimmyp , do you have any thoughts on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very much relying on halibut caching to work
I dont understand halibut enough to understand this
ensure we don't log too much since DetermineScriptServiceVersionToUse might be logging to task log.
I'm wondering if this is something that might need to change about tentacle client. The way I've been thinking about task logging in the new pipeline is that its not something most components should know about. The pipeline will raise events and task logging component somewhere will build up the task log as a read model.
Returning the version in start script would be the most efficient, but it does expose internal concerns to the public 😞
What exactly is the internal concern being exposed here sorry? In general our multi-node caching mechanism is going to be the DeploymentExecution
so my first thought is to reach for that if we do need caching. But I'm keen to here more about why we need caching here and what dependancies this would create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything written down anywhere of the goals and concerns tentacle client is supposed to deal with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the internal concern being exposed here sorry?
Before any RPC call, we determine the capabilities of the Tentacle (i.e. is it V1, V2 etc.) so we can use the correct interface. We cache the result of this so that when we keep doing multiple "get status" checks etc., we don't have to keep asking the same question.
Right now, this is all done within the Tentacle Client itself. So the caller does not need to care about it.
So the idea we are exploring is returning this information about the tentacle back to Octopus Server, so then Octopus Server can use it for future calls to Tentacle Client (and therefore Tentacle Client would not have to ask the Tentacle again).
But in doing so, we are exposing some info to achieve this.
However, recently, @gb-8 and I were exploring the idempotency issue with V1 tentacles, and realized that we may need to make some changes to support that. The main change is to actually expose the results of "Get Capabilities" so that the caller can make choices about how to drive Tentacle Client if it's a V1 tentacle.
If we do that, then we would already have exposed this info. So maybe it becomes moot.
Is there anything written down anywhere of the goals and concerns tentacle client is supposed to deal with?
Not that I'm aware of. But I wasn't around in the early days. @LukeButters might know if there is?
@@ -209,6 +211,136 @@ async Task<DataStream> DownloadFileAction(CancellationToken ct) | |||
} | |||
} | |||
|
|||
|
|||
public async Task<ScriptStatusResponseV3Alpha> StartScript( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to return something that is not the type that is received over the wire.
Perhaps two things distinct objects should be returned here
- The interesting data returned e.g. logs.
- An object which can be passed to
GetStatus
containing everything required to do the nextGetStatus
call. This object could additionally have some extension methods for serialization and deserialization, which is all taken care of in the client (and tested in the client).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like that. It helps as an anti-corruption layer, plus it helps to define responsibilities.
Thinking this through, I'd say we create:
- New class
CommandContext
(or a better name?)- Has
ScriptTicket
andNextLogSequence
- Has
- New class
CommandResponse
(or a better name?)- Has
State
,ExitCode
, andLogs
(whereLogs
is still a list ofProcessOutput
?) - A new property that holds the
CommandContext
for the next command. E.g.NextCommandContext
?
- Has
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we could use CommandContext
as the parameter to the other commands, to really highlight this is what you need to use these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, in the future it would be nice if the client can tell us if the script is hung (which is after the script has started and it has not logged output for more than some amount of time). Could we stash the information needed to do that in the command context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably.
This sounds like a nice separate piece of work to come afterwards. I'll create a card for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, that hung stuff is something we almost did (2nd highest voted bet). Also I think it has to be in the client since it is not obvious when a script actually started :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Also, story created
return result; | ||
} | ||
|
||
public async Task<ScriptStatusResponseV3Alpha> Finish( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling Finish out makes sense since cleaning up is destructive and we do need to let callers do stuff like collect artifices before we clean up
var result = await executor.GetStatus(getStatusCommand, scriptExecutionCancellationToken); | ||
|
||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the caller of this method now responsible for delays between calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 🙂
The current ExecutionsV2 pipeline does that already
I think in general this is rather good. |
d685d53
to
6494980
Compare
Summary of discussion of PR: What the PR achieves:
What the PR does not address:
Benefits of getting this PR merged:
Cost of getting this PR merged:
Risks management:
|
6494980
to
568d071
Compare
A few thoughts:
|
using Octopus.Tentacle.Contracts.ScriptServiceV3Alpha; | ||
|
||
namespace Octopus.Tentacle.Client | ||
{ | ||
public interface ITentacleClient : IDisposable | ||
{ | ||
Task<CommandResponseV3Alpha> StartScript(StartScriptCommandV3Alpha startScriptCommand, ILog logger, CancellationToken scriptExecutionCancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further to @LukeButters point around service types leaking out of tentacle client, The StartScriptCommandV3Alpha
also really needs to get changed to a non-service type.
In my mini-RFC (as mentioned by Pawel), part of the proposal is to remove service types leaking into the Server codebase.
I'm happy for you to leave it for now, but just be aware that this is something I'm intending to address as part of the proposed solution in the RFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. When you're back, maybe we should get together to sync our two streams of work so we get a good path forward 🙂
Background
In the ExecutionsV2 pipeline, all tentacle communications are done asynchronously via events.
Results
Before
The
TentacleClient.ExecuteScript
method does the entire communication stack, from starting, to getting status, to completing etc.After
We now expose the required operations needed to interact with Tentacle.
Considerations
Script Ticket
Tentacle V1 creates the script ticket on the Tentacle itself, whereas Tentacle V2 onwards are given the script ticket. This concept was hidden in the
TentacleClient
.But now that we are driving the orchestration in Octopus Server, we need to expose this to the caller.
Orchestration vs Execution
We had to split apart the execution away from orchestration. This is because our new methods on the client are purely execution. But we want to make sure we are executing in the same way we do during orchestration.
Cancelling/Completing During Start
One thing we had to maintain when splitting execution and orchestration is that we try to cancel/complete in orchestration if we started attempting to start during execution.
This is why the
ScriptExecutionCancelledAfterPotentiallyStartingException
was introduced.We also need to respect this contract in the ExecutionsV2 pipeline.
Using Latest Command/Response
The code became far simpler when the orchestrators and executors respected the contract that parameters and return values were always using the latest version (in this case V3Alpha).
This prevented the need for many generic arguments between layers.
It also simplified mapping.
ClientOperationMetricsBuilder
This PR does not have a good solution for the metrics we are currently gathering.
ClientOperationMetricsBuilder
gets all the metrics about an operation, and reports back on that. But the operation is now split up into separate calls. So we cannot do this.We need to revisit this at some point.
Testing
We should write our own set of tests on the new methods of the
TentacleClient
if this draft is considered worth progressing.How to review this PR
This PR moves logic around so that it can be exposed. It would be great if we could please see if they are still logically equivalent.
Quality ✔️
Pre-requisites