Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use specific client set for operations #105

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

machacekondra
Copy link
Member

This PR add usage of specific client set for specific operation.
We now use client set built from kubeconfig parameter only for creation
and watching of the experiment of the pod.
And we use client set built from litmuskubeconfig for watching/updating
chaosengine pod, which may run on different cluster.

Signed-off-by: Ondra Machacek omachace@redhat.com

What this PR does / why we need it:

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Checklist:

  • Fixes #
  • Labelled this PR & related issue with documentation tag
  • PR messages has document related information
  • Labelled this PR & related issue with breaking-changes tag
  • PR messages has breaking changes related information
  • Labelled this PR & related issue with requires-upgrade tag
  • PR messages has upgrade related information
  • Commit has unit tests
  • Commit has integration tests

This PR add usage of specific client set for specific operation.
We now use client set built from kubeconfig parameter only for creation
and watching of the experiment of the pod.
And we use client set built from litmuskubeconfig for watching/updating
chaosengine pod, which may run on different cluster.

Signed-off-by: Ondra Machacek <omachace@redhat.com>
flag.Parse()
// Use in-cluster config if kubeconfig path is specified
if *litmuskubeconfig == "" {
configLitmus, err := rest.InClusterConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be configExperiment ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no, whatever is specified by litmuskubeconfig parameter is then defined in configLitmus variable. And whatever is specified in kubeconfig parameter is then defined in configExperiment variable.

Copy link
Member

@ksatchit ksatchit Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it. Could we name this better. Can we say litmusControlPlaneKubeConfig as the one which deals w/ chaos-operator/runner/experimentCR/engineCR & targetKubeConfig as the one that deals w/ the target application & also the experiment job/helper pod & chaosresult resources (note that chaosresult CR is created by the experiment and thereby will reside in the target cluster) .. the logic to read chaosResults and patch to enginine may need to be looked at

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

// Use in-cluster config if kubeconfig path is specified
if *kubeconfig == "" {
config, err := rest.InClusterConfig()
configExperiment, err := rest.InClusterConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this, the configLitmus?

kubeconfig := flag.String("kubeconfig", "", "absolute path to the kubeconfig file")
litmuskubeconfig := flag.String("litmuskubeconfig", "", "absolute path to the kubeconfig file")
Copy link
Member

@ksatchit ksatchit Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can also distinguish the usage/help message to specify:

litmuskubeconfig - "absolute path to the kubeconfig file of target cluster where experiment job is launched"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -116,7 +116,7 @@ func BuildingAndLaunchJob(experiment *ExperimentDetails, clients ClientSets) err

// launchJob spawn a kubernetes Job using the job Object received.
func (experiment *ExperimentDetails) launchJob(job *batchv1.Job, clients ClientSets) error {
_, err := clients.KubeClient.BatchV1().Jobs(experiment.Namespace).Create(job)
_, err := clients.KubeClientExperiment.BatchV1().Jobs(experiment.Namespace).Create(job)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand which config is used for which purpose.

  • kubeconfig & KubeClientExperiment for ChaosEngine, ChaosExperiment, ChaosRunner, ChaosOperator
  • litmuskubeconfig & k8sClientSetLitmus for TargetApplication, ExperimentJob/Pod, ChaosResult
    right? if this is the case, would you like to use the clients.KubeClientExperiment to get the chaosresult verdict here?
    .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I was thinking having the chausresult used by kubeconfig. Do you think it should be used by litmuskubeconfig?

@ksatchit
Copy link
Member

ksatchit commented Oct 23, 2020

Also, just thinking out load - on the impacts/benefits (usecases) of this change. Today, the chaos resources (CRs) and the associated pod/job resources are tightly tied together.

  • The engine details is referenced inside the experiments and the chaos k8s events are generated against the chaosengine resource
  • The abort logic today is via patching the engine resource and has the operator itself removing the runner/experiment jobs/helper etc.,
  • The chaosresults are generated by the experiment and viewed by the runner in order to patch the engine (result being a more involved descriptor of test details over enginestatus, in which just a summary is maintained)

This would mean playing/passing around multiple configs across components. While feasible, was wondering if this is really needed?

@machacekondra , there is a feature in the (up and coming) litmus-portal which allows you to connect an "external cluster" and execute chaos against it via an agent (that is auto deployed when this 'cluster connect' happens). The target/external cluster also has the entire set of litmus components installed into it. The only difference is in case of the portal we run chaos workflows (Argo workflows that embed the engine spec) rather than the engine directly. Do you think this satisfies your usecase? Here is a ref to the userguide: https://docs.google.com/document/d/1fiN25BrZpvqg0UkBCuqQBE7Mx8BwDGC8ss2j2oXkZNA/edit#heading=h.solcj2aoqyws

@ispeakc0de
Copy link
Member

Also, just thinking out load - on the impacts/benefits (usecases) of this change. Today, the chaos resources (CRs) and the associated pod/job resources are tightly tied together.

  • The engine details is referenced inside the experiments and the chaos k8s events are generated against the chaosengine resource
  • The abort logic today is via patching the engine resource and has the operator itself removing the runner/experiment jobs/helper etc.,
  • The chaosresults are generated by the experiment and viewed by the runner in order to patch the engine (result being a more involved descriptor of test details over enginestatus, in which just a summary is maintained)

This would mean playing/passing around multiple configs across components. While feasible, was wondering if this is really needed?

@machacekondra , there is a feature in the (up and coming) litmus-portal which allows you to connect an "external cluster" and execute chaos against it via an agent (that is auto deployed when this 'cluster connect' happens). The target/external cluster also has the entire set of litmus components installed into it. The only difference is in case of the portal we run chaos workflows (Argo workflows that embed the engine spec) rather than the engine directly. Do you think this satisfies your usecase? Here is a ref to the userguide: https://docs.google.com/document/d/1fiN25BrZpvqg0UkBCuqQBE7Mx8BwDGC8ss2j2oXkZNA/edit#heading=h.solcj2aoqyws

Adding to @ksatchit comments, Few more complexities can arise if we will use different kubeconfig for runner & experiment-pod. In the case of node attributes/operations i.e, node-selector, node-taint, etc. We are bounding those pod to the same node so that they will remain unaffected in case of node-level chaos. The separate cluster/kubeconfig may have different node-name/labels which can result to failure for those cases.

@machacekondra
Copy link
Member Author

Just went quickly throught the targeted cluster doc and it should solve the use case. I will try it.

@machacekondra
Copy link
Member Author

@ispeakc0de @ksatchit Is there any plan to have some API for the create external cluster, rather then doing it only via UI app?

@ksatchit
Copy link
Member

ksatchit commented Oct 29, 2020

@ispeakc0de @ksatchit Is there any plan to have some API for the create external cluster, rather then doing it only via UI app?

@machacekondra that's a really good point. IMO, we should definitely have well structured and documented API for this - & hope this is already being thought of actively. Tagging @gdsoumya and @rajdas98 to respond.

@gdsoumya
Copy link
Member

@ispeakc0de @ksatchit Is there any plan to have some API for the create external cluster, rather then doing it only via UI app?

@machacekondra we have graphql apis exposed, though they are currently undocumented for public use. It should be possible to interact with the server directly using those. We can give a quick walkthrough if you want to try it immediately, we will document the APIs soon so that it can be used directly.

@machacekondra
Copy link
Member Author

Well I find it quite not easy to use the multiple clusters from the portal, so I would welcome if it could be somehow possible via multiple kube config files. Other approach would be to have only special kubeconfig file for the helper pod, if that could somehow solve the gaps. What do you think?

@ksatchit
Copy link
Member

ksatchit commented Nov 3, 2020

Hey @machacekondra , sorry to hear you are not finding the portal convenient - hopefully this is addressed in the upcoming builds. Having said that launching helpers alone (and maybe any other specific actions on the target clusters - via a different kubeconfig) on a case-by-case basis does seem like a better option than changing it in chaos-runner. Makes things simpler/less-complicated by one level. I would like to hear @ispeakc0de 's views on this too

@gdsoumya
Copy link
Member

gdsoumya commented Nov 3, 2020

Hi @machacekondra, can you share your concerns or issues that you might be facing while trying to use the portal? Once we have the api doc public it should be possible to automate the external cluster connects and also the chaos workflow deployments. If you want to try the api way of doing things we can give you a rough documentation for now.

@machacekondra
Copy link
Member Author

machacekondra commented Nov 4, 2020

@gdsoumya Well, I think then you are locked to the litmus-portal usage only with chaos workflows. Users can't then simply use ChaosEngine etc. am I correct?

@machacekondra
Copy link
Member Author

If you want to try the api way of doing things we can give you a rough documentation for now.
What I would need is following:

  1. Create external cluster.
  2. Obtain the kubectl I should run on the external cluster to set it up.
  3. Run the experiment on external cluster, while the managment cluster would be the self cluster.

@gdsoumya
Copy link
Member

gdsoumya commented Nov 5, 2020

Hi @machacekondra yes you are right, you cannot directly use chaosengines but you can surely use them inside your chaos-workflow. If your goal is just to run an experiment you can run a workflow with only one step that is executing a chaosengine.

Yes the above flow should be possible with the apis, as I mentioned our apis are graphql apis so you can use any graphql client of your choice in your programming language or even craft http requests to perform the gql operations directly.

Once you setup your protal(login and setup the admin project) and have an exposed address for the backend server, you need to first get a auth token from the auth server using an api call and then use that token as an auth header in your gql queries. That's the only extra step you need to perform. After that there is basically only 2 operations or api calls that you need to make - 1st one is to get the manifest to connect the external cluster and second api call is to create the workflow that will be executed.

If you want we can write a sample script/program in go to show how you can do the above using the apis through http requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants