13 Oct 14:15

ksatchit

1.9.0-RC1

183ff31

1.9.0-RC1 Pre-release

Pre-release

adding create configmap permission in subscriber manifest and few ref…

…actors in subscriber (#2248)

* adding create configmap permission and few refactors in subscriber

Signed-off-by: Raj Babu Das <raj.das@mayadata.io>

* adding create configmap permission and few refactors in subscriber

Signed-off-by: Raj Babu Das <raj.das@mayadata.io>

Assets 2

15 Sep 17:18

ksatchit

1.8.0

b8b4ade

1.8.0

New Features & Enhancements

Introduces the alpha-0 version of Litmus Portal. The portal helps you to execute & visualize chaos workflows, amongst many other things. Learn more about it here
Extends Litmus Probes with “Continuous” mode to validate the hypothesis around application behavior during chaos execution as against just at specific points/phases (start & end of chaos)
Adds Node & Pod level I/O stress chaos experiments with the ability to tune worker threads and filesystem usage, to the generic experiment suite.
Supports network chaos on Containerd & CRI-O runtimes, in addition to Docker.
Supports network chaos between distinct microservices (in addition to total interface level egress traffic chaos) specified by their IPs or hostnames/service FQDNs
Enhances the ChaosSchedule schema for repeat mode by adding IncludedHours & IncludedDays. The StartTime/EndTime definitions have been made optional to allow flexibility in being able to run from the point of creation of schedule CR or indefinitely until removal.
Migrates Cassandra ring disruption experiment to go-based chaoslib
Adds the ability to specify a target pod (env: TARGET_POD) or node (env: APP_NODE) as the application/resource under test, apart from randomized selections based on labels.
Enables the definition of blast radius for an application as a percentage value (PODS_AFFECTED_PERCENTAGE), by which an appropriate number of replicas undergo the specified chaos in parallel.
Improves the litmus chaoslib to take container fs & runtime socket file paths as tunables to support different Kubernetes platforms
Includes an additional pumba-based chaoslib for cpu/memory stress that uses external chaos containers (non-pod exec mode)
Adds chaos command tunables (for chaos injection & revert) for cpu/memory chaoslib (in pod exec mode) - in order to cover different base images & distros.
Supports broader filtering of pods within a namespace when no application labels are provided in .spec.appInfo. Users can also choose to skip the specification of application namespace explicitly, in which case the target pods are selected randomly from the ChaosEngine resource namespace.
Modifies the litmus chaos containers (operator, runner) to run with non-root users
Allows the definition of an INSTANCE_ID in the ChaosEngine to provide additional context or metadata to an experiment run. This also aids the creation of newer ChaosResult resources instead of patching/overwriting existing ones in case of repeated executions.
Improves the experiment code standards by fixing the issues listed in the GoGitOps report card for the litmus-go repository.
Generates events against the ChaosResult resource to indicate the experiment verdict (Pass, Fail, Stopped). These are useful in annotating monitoring dashboards with experiment results.
Enhances the Chaos Exporter to push chaos metrics to AWS CloudWatch
Improves the kubernetes-chaos helm chart by including options in the values.yaml to selectively install experiments via a whitelist/blacklist. Also maps the experiment names to reflect those on the ChaosHub.
Enhances the litmus-e2e with increased reporting around component-tests, the addition of e2e tests for new experiments, and Docker-based Gitlab runner for litmus-portal pipelines
Provides additional documentation based on experiment enhancements. Updates the get started documentation for general Kubernetes/OpenShift/Rancher platforms.
Enhances the litmus-demo scripts to generate a pdf report for the chaos experiments executed
Operationalizes the Litmus community Special Interest Groups (SIGs) for Documentation, Observability & Integrations.

Major Bug Fixes

Constructs ChaosResult name using experiment names passed from the ChaosExperiment resource instead of hardcoded experiment names
Fixes the chaos verification (whether chaos injection has occurred) steps in the container-kill experiment & retains the helper containers in case of errors for further debugging
Fixes the chaos event messages to be meaningful & include probe information only when the probes are defined
Removes the need for privileged containers to execute disk-fill chaos experiment
Handles the case where cpu/memory hog chaos processes are terminated or the target containers are OOM-Killed (this typically occurs when the memory hog/injection value exceeds resource limits set against the pods/containers). The error code 137 is handled appropriately with warning logs and the experiment proceeds with verification steps instead of erroring out/failing (the OOM-Kill is an expected behavior based on inputs provided)
Fixes the behavior in node-memory hog experiments where the provided input (percentage of node memory) is measured against the available memory instead of the total system memory
Propagates the custom chaos experiment annotations provided in the ChaosExperiment to the helper pods, if any. This is especially useful in cases where annotations decide scheduling or are mapped to certain IAM role/accounts etc.,

Deprecations & Breaking Changes

The instance count (.spec.schedule.instanceCount) property on the chaosSchedule has been deprecated in favor of maintaining just the minChaosInterval as a means of defining chaos cadence.

Major Known Issues & Limitations

Issue

The network chaos experiments (especially on docker runtime, using the litmus pumba lib) can end up with a Failed ChaosResult, and the app stuck in CrashLoopBackoff state in case of application deployments configured with liveness probes (that are set up to access health/service endpoints). Typically, this lib injects the tc netem rule against the interface by running a “chaos container” that attaches to the network namespace of the target container via the target’s container ID. The same ID is used in a subsequent container launched to revert the rule/chaos. However, with liveness probes, the container is restarted several times during the course of the chaos duration, causing the ID to change. The revert fails, with the network rule still persisting (courtesy the Kubernetes pause container for this app pod) leading to the app entering a CrashLoopBackOff state.

Current Workaround

Delete/reschedule the target pod manually to recreate the pause container/network namespace.
Use Target IPs or Hosts to inject the chaos b/w specific microservices while keeping the probe alive.

Note: This is expected to be fixed in a 1.8.x patch release

Issue

The kubelet-service-kill experiment makes use of systemctl to stop/start the service today. Running this experiment w/o an external LIB_IMAGE & leveraging the experiment image can throw the error Failed to connect to bus: No data available as the experiment runs with a non-root user.

Current Workaround

A standard Ubuntu image that runs as root can be used in a “helper” pod that injects this chaos. However, user-discretion is advised in terms of providing this access.

Issue

The pod-cpu-hog & pod-memory-hog experiments that run in the pod-exec mode (which is typically used when the users don’t want to mount runtime’s socket files on their pods) using the default lib can tend to fail, in spite of chaos being injected successfully - due to the unavailability of certain default utils (that is used for detecting the chaos process and killing them/reverting chaos at the end of the chaos duration) in the target’s image.

Workaround

Users can identify the necessary commands to derive and kill the chaos PIDs and pass them to the experiment via env variable CHAOS_KILL_COMMAND
Alternatively, they can make use of the chaos lib that uses external containers with SYS_ADMIN docker capability to inject/revert the chaos, while mounting the runtime socket file. Note that this is supported only on docker at this point.

Note: This is expected to be fixed in a 1.8.x patch release

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.8.0.yaml

Verify your installation

Verify if the chaos operator is running
kubectl get pods -n litmus
Verify if chaos CRDs are installed
kubectl get crds | grep chaos

For more details refer to the documentation at Docs

Assets 2

15 Sep 04:21

ksatchit

1.8.0-RC2

3c34d21

1.8.0-RC2 Pre-release

Pre-release

Merge pull request #2071 from rajdas98/cherry-pick-1.8.0-rc2

Cherry pick 1.8.0 rc2

Assets 2

10 Sep 09:49

ksatchit

1.8.0-RC1

8623ee3

1.8.0-RC1 Pre-release

Pre-release

chore: (litmus-portal) Refactoring and bug fixes (#2027)

This commit has the following changes:
- folder structure change for models and useEffect fixes
- user redux fixed
- graphql documents re-organised

Signed-off-by: arkajyotiMukherjee <arkajyoti.mukherjee@mayadata.io>

Assets 2

15 Aug 16:19

ksatchit

1.7.0

ae0b913

1.7.0

New Features & Enhancements

Introduces experiment probes to enable declarative specification of entry/exit (success) criteria via the chaosengine. This release supports the Command, Kubernetes & HTTP probe types that can be configured in SoT (Start of Test), EoT (End of Test) & Edge execution modes. With this, users can reuse generic experiments to test a variety of app-specific/context-specific chaos scenarios.
Enhances the chaosresult status schema to include the ProbeSuccessPercentage score that gives an overview of the app/infra resilience to a specific chaos experiment run
Refines operational modes of litmus: Introduces namespaced operator support in helm charts to support multi-developer/shared cluster use-case with dedicated namespaces, such as in the Okteto Cloud, while updating the admin & standard mode functionality to watch engine resources in litmus & across namespaces respectively
Adds functionality to look for target applications in the chaosengine resource namespace if the target namespace is not explicitly specified.
Validates/prevents malformed application labels in the chaosengine
Improves the ChaosEngine status schema to hold more info (experiment pod names, runner names) that can aid other tools/abstractions running the experiment to derive/parse useful info for further reuse (logs extraction, for ex.)
Adds Microsoft Azure Kubernetes Service (AKS) as a supported platform for the generic experiment suite.
Adds a new chaos experiment to scale pods/test node autoscale functionality
Adds the libraries for the execution of AWS chaos using chaostoolkit, orchestrated by Litmus.
Adds support for the specification of host file mounts in chaos experiments
Allows setting polling intervals and timeouts for status checks via chaosengine to aid tuning execution for slower environments
Removes dependencies on multiple experiment “helper” (auxiliary) images and makes the litmus go-runner self-sufficient in handling the required chaos business logic. This eases maintenance, especially in the case of air-gapped environments / downstream projects that build the litmus components in their respective CI/CD pipelines.
Enhances the experiment to “fail fast” upon failed app checks in cases where containers are terminated
Upgrades the ansible-runner to use python3
Enhances the developer experience for litmus chaos experiments by using Okteto CLI to develop & test experiment business logic in-cluster over repeating image-build-job-run cycles
Updates the scaffold utils to generate the experiment bootstrap code based on the latest developments in the experiment structure.
Adds chaos-instrumented grafana dashboards for the sock-shop application along with details on setting up monitoring for chaos experiment runs.
Adds pre-defined/usable workflows for repeatable execution of node resource chaos in the chaos-charts repo
Pushes the technical preview / pre-alpha version of the litmus-portal (available on the master branch).
Refactors the litmus-e2e repo/code-structure to simplify the addition of new BDD tests (modularization, removal of bash utils, formatted errors, klog usage, scenario coverage parameters)
Adds additional stages in litmus-e2e GitLab pipelines to execute both the go-based & ansible-based chaos experiments
Improves github-actions based comment-triggered e2e runs with log details
Features a completely revamped & improved ChaosHub
Improves the project wiki with more information for users and developers (architecture docs, video tutorials, charters for the Litmus Special Interest Groups)

Major Bug Fixes

Patches the chaosengine with the right (‘stopped’) and fixes the event to provide the right reason in cases where app filtering is unsuccessful. This will allow a re-apply of the engine to re-trigger the application.
Adds a check to factor-in cordoned (SchedulingDisabled) status of nodes in kubelet & docker-service kill experiments.
Provides the tc_image used in network chaos experiments as an experiment tunable over hardcoding in order to support users with internal image registries
Decides experiment termination based on chaos container status over that of chaos pod objects to support operations in a service-mesh environment (istio, linkerd) where all pods (including chaos resources) are injected with sidecars. Without this, the experiment runs forever due to the proxy sidecars.
Sets the restart policy of the experiments jobs to Never over OnFailure to prevent repeated re-execution for certain experiment failure conditions.
Fixes the incorrect eventType for chaos events in cases of failures & skipped executions.
Fixes the go-based pod-cpu-hog & pod-memory-hog experiments to execute the chaos processes (commands) in the target container by passing them as a args to shell instance (/bin/sh -c) to account for targets which may run with different entrypoints.
Fixes permission issues on the infra helm chart resulting in failed metrics collection

Breaking Changes

Stops support for the ansible-runner/executor (EoL) (Not to be confused with the ansible-based chaos experiments)
Removes the following repositories:
- litmuschaos/pages: The operator manifests are available over gh-pages sourced out of litmuschaos/litmus
- litmuschaos/chaos-helm: The experiments helm chart is also into the litmus-helm repo.
- litmuschaos/community: The demo procedures & community info are now available within the litmus-demo & the litmus repo respectively.

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.7.0.yaml

Verify your installation

Verify if the chaos operator is running
kubectl get pods -n litmus
Verify if chaos CRDs are installed
kubectl get crds | grep chaos

For more details refer to the documentation at Docs

Assets 2

15 Jul 15:13

ksatchit

1.6.0

38b701c

1.6.0

New Features and Enhancements

Specification of pod and container security context for the experiment resources via chaosexperiment spec
Introduces pod scheduling policy support via NodeSelector specification on the chaosengine (instance-specific attribute)
Ability to override experiment images from the chaosengine
Pushes an experiment execution summary event on the chaosresult resource
Adds the network chaos experiment to induce packet duplication
Adds node chaos experiment to force pod evictions via taints
Adds service chaos experiment to kill docker service on the node
Extends the golang chaoslib support for all existing chaos experiments in the generic suite
Validation webhook enhancements to verify if application labels specified in the chaosengine are propagated to pod templates of the applications under test (AUT)
Additional examples to illustrate litmus chaos-workflows using nginx benchmark using apache benchmark tool with parallel pod-kills
Migrates the ansible-based chaos experiments to the litmus-ansible repo from litmuschaos/litmus in line with the litmus-go, litmus-python repo structure
Improves the unit-test based coverage for chaos operator by 30%
Extends the capability trigger on-demand e2e runs for PRs via GitHub comments to chaos operator
Adds framework to determine e2e coverage percentage based on comparison of executed tests in the pipeline versus test plan
Introduces an e2e portal to view e2e pipeline data and coverage
Improves the Travis-based CI pipeline of the test-tools repo to build images only if respective Dockerfile or scripts are modified instead of building all images irrespective the nature of the commit.
Increases sources for (helm-based) litmus installation to include helm hub & jfrog chartcenter artifact repositories
Adds betterci integration to charthub to obtain UI/UX previews for PRs
Enhances individual experiment documentation with abort procedure & troubleshooting references
Enhances the experiment failure and uninstall troubleshooting sections to include more conditions
Includes steps to run chaos experiments on rancher platform
Includes missing video links/examples for chaos experiments in the generic suite
Updates all the litmuschaos websites (docs, charthub, project website) based on CNCF guidelines
Enhances the release guidelines doc with an enhanced release checklist

Major Bug Fixes

Fixes invalid Jinja template for chaos injection (helper) pod in the kubelet-service-kill experiment
Specifies an upper limit for the memory hog experiment docs based on the current resource exhaustion approach via dd
Adds instructions in infra (node) chaos experiments to cordon the AUT before the execution of chaos to prevent the restart of litmus pods
Fixes a race condition in the pod-delete experiment where the verdict is flagged as fail despite successful execution
Fixes Kafka experiment failure while trying to derive leader broker for the test topic (partition) due to missing ns and improper regex
Fixes coredns experiment regression (caused due to introduction of helper pods logic for the pod-delete experiment) due to missing
lib_image in experiment CR

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.6.0.yaml

Verify your installation

Verify if the chaos operator is running
kubectl get pods -n litmus
Verify if chaos CRDs are installed
kubectl get crds | grep chaos

For more details refer to the documentation at Docs

Assets 2

09 Jul 05:16

ksatchit

1.5.1

8ba20bc

1.5.1

[Cherry Pick to 1.5.1] Inhibit experiment image creation from branche…

…s of litmus repo (#1682)

* (chore)releases: updated release artefacts (#1552)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* (chore)roadmap: add item for litmus portal (#1553)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* Add merge label and auto-merge feature in gihtub actions (#1556)

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

* refactor(readme): Add more details in pod network corruption readme (#1555)

* refactor(readme): Add more details in ood network corruption readme

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

* Update experiments/generic/pod_network_corruption/README.md

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>

* refactor(experiment): Add pod memory hog default memory consumption (#1576)

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

* bug(indentation): Fix the indentation in kubelet service kill experiment (#1581)

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

* (chore)roadmap: update availability of scaffold scripts to generate experiment code (#1584)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* (chore): update schematic representation of litmus arch (#1589)

* (chore): update schematic representation of litmus arch

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* (refactor)demo: add an updated demo video

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* (chore)governance: update maintainer email IDs (#1599)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* (chore)content: add folder to discuss chaos engg (#1619)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* Update the backlog in Roadmap with IO-Chaos

* Stopped CircleCi Build for master branch (#1625)

* Stopped CircleCi Build for master

Signed-off-by: gdsoumya <gdsoumya@gmail.com>

* Update config.yml

* Update config.yml

* Update config.yml

* Update config.yml

* Update config.yml

* (chore)roadmap: add backlog item on chaos workflows for application benchmarks (#1626)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* (chore)ci: inhibit push of ansible-runner image from litmus (#1660)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

Co-authored-by: UDIT GAURAV <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Uma Mukkara <uma@mayadata.io>
Co-authored-by: Soumya Ghosh Dastidar <44349253+gdsoumya@users.noreply.github.com>

Assets 2

15 Jun 17:41

ksatchit

1.5.0

993d2e7

1.5.0

New Features and Enhancements

Features a revamped chaos charthub with a more resilient design and improved user experience
Introduces ability (github workflows) to trigger individual/multiple e2e tests or complete e2e test-suite for litmus PRs via GitHub comments
Adds a new repo litmuschaos/litmus-demo to provide a fully packaged demo environment to run chaos under 10 min
Adds node service kill chaos chaos libraries (& kubelet kill chaos experiment on specified nodes)
Improves the pod cpu hog experiment by adding go chaoslib to support containerd/crio runtime
Introduces chaoslib pattern to choose blast radius / percentage (target) pods and abort chaos on target containers
Improves the chaos-scheduler controller to halt/resume chaos
Enhances the chaos-schedule CR schema to provide dedicated attributes for the schedule modes (now, once, repeat) over mutually-exclusive fields with enhanced OpenAPI schema validation
Introduces ImagePullPolicy as a chaosexperiment CR attribute (.spec.definition.imagePullPolicy) to support usecases where the experiments are needed to be run with locally built images, as with PR-triggered e2e
Enhances the container-kill experiment to repeat the chaos per an interval over a total duration with support for containerd/crio runtime.
Adds go-based helper pods for pod-delete and container-kill chaos libraries
Improves the litmus-go scaffold tool to use lighter base images & improved default events
Improves the validating webhook-based admission controller to call out missed annotations on target applications
Improves unit-test coverage for chaos-operator
Enhances the getting started (chaosengine construction) & troubleshooting docs (uninstallation steps)

Major Bug Fixes

Fixes the missing/clustered event generation on litmus-go chaos experiment
Fixes operator behavior of triggering chaos disregarding annotation status on the target application
Fixes the cluster level running experiment count metric from chaos-exporter
Adds concurrent updation of the event counter for each iteration of chaos injection
Fixes chaos experiment failures (securitycontext additions) on OpenShift 4.3

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.5.0.yaml

Verify your installation

Verify if the chaos operator is running
kubectl get pods -n litmus
Verify if chaos CRDs are installed
kubectl get crds | grep chaos

For more details refer to the documentation at Docs

Assets 2

03 Jun 05:42

ksatchit

1.4.1

961c7fa

1.4.1

[Cherry-Pick for 1.4.1]  (#1535)

* (chore)roadmap: update roadmap status (#1530)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* update(helper-pod): Wait till the helper pod come into running state (#1533)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>

Assets 2

29 May 20:11

ksatchit

1.4.1-RC1

af7ce00

1.4.1-RC1 Pre-release

Pre-release

fix(pod-delete): Fixing pod-delete chaolib (#1526) (#1528)

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

Co-authored-by: UDIT GAURAV <35391335+uditgaurav@users.noreply.github.com>

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features & Enhancements

Major Bug Fixes

Deprecations & Breaking Changes

Major Known Issues & Limitations

Issue

Current Workaround

Issue

Current Workaround

Issue

Workaround

Installation

Verify your installation

New Features & Enhancements

Major Bug Fixes

Breaking Changes

Installation

Verify your installation

New Features and Enhancements

Major Bug Fixes

Installation

Verify your installation

New Features and Enhancements

Major Bug Fixes

Installation

Verify your installation

Releases: litmuschaos/litmus

1.9.0-RC1

1.8.0

New Features & Enhancements

Major Bug Fixes

Deprecations & Breaking Changes

Major Known Issues & Limitations

Issue

Current Workaround

Issue

Current Workaround

Issue

Workaround

Installation

Verify your installation

1.8.0-RC2

1.8.0-RC1

1.7.0

New Features & Enhancements

Major Bug Fixes

Breaking Changes

Installation

Verify your installation

1.6.0

New Features and Enhancements

Major Bug Fixes

Installation

Verify your installation

1.5.1

1.5.0

New Features and Enhancements

Major Bug Fixes

Installation

Verify your installation

1.4.1

1.4.1-RC1