Skip to content

Releases: litmuschaos/litmus

2.0.0-Beta6

15 May 18:34
8c3d20a
Compare
Choose a tag to compare

Major Updates

  • Added MongoDB go-interface and refactored the database operations and structure to accommodate the test cases easily.
  • Support for adding custom container image registry to chaos workflow manifest.
  • Enhanced the performance of the analytics APIs with memory caching and added APIs to fetching labels and values for a Prometheus series.
  • Added support for mutating the sequence of the workflow steps by drag and drop which reflect the live changes in the DAG.
  • Enhanced the workflow graph to show other node phases such as Omitted, Skipped, and Error for a good user experience.
  • Enhanced the verify and commit page to allow users to have a final review and edit their workflow details before scheduling the same.
  • Bug fixed for some user management operations and refactored teaming APIs to increase the performance.
  • Enhanced the litmusportal user interface to fastens the onboarding process.

Minor Updates

  • Adding support for liveness check of the dependent applications in the agent plane before going active.
  • AirGapped support for the pre-defined workflows by moving the fetching logic to the backend.
  • Added instance-id label in the chaos workflow manifest to avoid multiple scheduling in the multi-Argo server cluster.
  • Added validations for workflow name, GitHub URL, and different probe inputs.

2.0.0-Beta5

30 Apr 21:30
59904a4
Compare
Choose a tag to compare
2.0.0-Beta5 Pre-release
Pre-release
Minor SA fix in eventtracker (namespace) (#2760)

Signed-off-by: Raj Das <mail.rajdas@gmail.com>

2.0.0-Beta4

20 Apr 19:49
7494b0b
Compare
Choose a tag to compare

Major Updates

  • Fixes the inability to successfully register the agents/targets when litmus portal server is brought up with loadbalancer/nodeport service type
  • Makes MyHub source configurable by branch so that latest stable versions of experiments are pulled for custom & predefined workflows
  • Updates the chaos operator dependencies on the subscriber to make use of the latest api changes for chaos resources
  • Updates the chaos operator, runner & exporter image tunables/ENVs in the subscriber so that the latest stable versions are installed on the targets
  • Updates Okteto dev setup instructions to reflect latest image versions and changes in specification (env) as well as instructions
  • Updates the chaosengine CRD validation schema for annotation injection in the manifests maintained & installed by the subscriber

Minor Updates

  • Improves the icons for revert chaos and workflow scheduling
  • Optimizes the teaming code to remove redundant conditions
  • Improved styling & background adopted from litmus-ui

2.0.0-Beta3

15 Apr 18:48
aff0fef
Compare
Choose a tag to compare

Litmus 2.0.0-Beta3

Major Updates

  • Support for policy-based control of event tracker where users can define their own policy using JMESPath query and based on that event-tracker will react to the application changes.
  • Enhanced UI for workflow Scheduling, gives users the ability to tune annotations, target application details like application namespace, labels, and kind, and probe data using User Interface.
  • New UI for workflow visualization for showing information about workflow and nodes in a better way.
  • We made the onboarding process for users and easier to use through the new UI.
  • Enhanced the homepage to show information like Recent workflow runs, Agent details, and Project details.
  • Shifting project switching from using Redux-based technique to URL-based technique to avoid caching problems.
  • Migrated CircleCI to GitHub workflow and enhanced the continuous integration of the project.
  • Enhanced the analytics module in terms of UI and computation
  • Enhanced the browse workflows table to show resilience score and the total number of experiments passed for the listed workflows.* Support role-based access control in the backend for handling authorization for all requests.
  • Support for storing scheduled workflow templates and adding some new podtato-head predefined workflow templates

Minor Updates

  • Increment in the Better Code Hub(BCH) score
  • Optimized the frontend by shifting the resiliency score calculation to the backend.
  • Restructured the directory structure for settings in the frontend to modularise the code.
  • Support for a reinstall of litmus agents by moving the litmus-portal-config configmap independent of the subscriber.
  • Support for Ingress and Load balancer network type for connecting external agents with Litmus Portal. Based on the server service type, it will generate the endpoint for the external agent.

2.0.0-Beta2

30 Mar 08:35
b9fa74c
Compare
Choose a tag to compare
2.0.0-Beta2 Pre-release
Pre-release
Added beta2 fixes for auth and teaming (#2612)

Signed-off-by: Saranya-jena <saranya.jena@mayadata.io>

2.0.0-Beta1

15 Mar 18:17
ae16761
Compare
Choose a tag to compare

Major Updates

  • Support for in-built analytics, where users can connect their data sources and generate dashboard panels.
  • Support for Git as a single source of truth for workflow artifacts. This enables users to have their workflows synced between the portal and Git source.
  • Introduces the event-tracker microservice to trigger chaos workflows automatically upon change to application images. This feature works in tandem with GitOps frameworks that rollout changes to applications upon manual changes in the Git source or upon image push to registries.
  • Support for re-running of existing chaos workflow from the litmus portal.
  • Adding a command-line tool called litmusctl to manage litmus portal services. The key role of litmusctl is to connect the external cluster with the litmus server and install the external agents.
  • Redesigning the teaming user interface and adding some significant features such as leave project, decline invitation.
  • Recreating litmus docs for litmus 2.0.x. For more information, visit https://litmusdocs-beta.netlify.app/
  • Integration of Litmus-UI with litmus portal components
  • Major directory restructuring of litmus portal’s server for database handlers

Minor updates

  • Changing MongoDB kind from deployment to statefulsets
  • Adding chaos-exporter as default external cluster agents for litmusportal
  • Refactoring authentication server to accommodate new teaming integration
  • Removing some unnecessary inputs from the welcome modal and predefined chaos workflow

2.0.0-Beta0

05 Mar 15:14
7cd9a5a
Compare
Choose a tag to compare
Fixed default error state for password fields and fixed modal padding…

… (#2505)

* Fixed default error state for password fields and fixed modal padding

Signed-off-by: SarthakJain26 <sarthak.jain@mayadata.io>

* added text to translation

Signed-off-by: SarthakJain26 <sarthak.jain@mayadata.io>

1.13.8

15 Jul 21:43
dc086b3
Compare
Choose a tag to compare

New Features & Enhancements

  • Introduces upgraded pod-cpu-hog & pod-memory-hog experiments that inject stress-ng based chaos stressors into target containers pid namespace (non-exec model).

  • Supports multi-arch images for chaos-scheduler controller

  • Supports CIDR apart from destination IPs/hostnames in the network chaos experiments

  • Refactors the litmus-python repository structure to match the litmus-go & litmus-ansible repos. Introduces a sample python-based pod-delete experiment with the same flow/constructs as its go-equivalent to help establish a common flow for future additions. Also adds a BYOC folder/category to hold non-litmus native experiment patterns.

  • Refactors the litmus-ansible repo to remove the stale experiments (which have been migrated and improved in litmus-go). Retains (improves) samples to help establish a common flow for future additions

  • Adds GCP chaos experiments (GCP VM stop, GPD detach) in technical-preview mode

Major Bug Fixes

  • Fixes erroneous logs in the chaos-operator seen while attempting to remove finalizer on chaosengine

  • Fixes a condition where the chaos revert information is present in both annotations as well as the status of chaosresult CR (the inject/revert status is typically maintained/updated as an annotation on the chaosresult before it is updated into the status and cleared/removed from annotations)

  • Removes hardcoded experiment job entrypoint, instead of picking from the ChaosExperiment CR’s .spec.definition.command

  • Fixes a scheduler bug that interprets a minChaosInterval mentioned in hours (ex: 1h) in minutes

  • Improves the scheduler reconcile to stop flooding/logging every “reconcile” seconds irrespective of the minChaosInterval

  • Enables the scheduler to start off with the chaos injection immediately upon application of the ChaosSchedule CR without waiting for the first installment of minChaosInterval period - in repeat mode with only the minChaosInterval specified

  • Handles edge/boundary conditions where chaos StartTime is behind CreationTimeStamp of ChaosSchedule OR next iteration of chaos as per minChaosInterval is beyond the EndTime

  • Adds a check to ignore chaos pods (operator, runner, experiment/helper/probe pods) and blacklist them from being chaos candidates (esp. needed when appinfo.applabel is configured with exclusion patterns such as: !keys OR <key> notin <value>)

  • Removes hostIPC, hostNetwork permissions for pod stress chaos experiments

  • Fixes an incorrect env key for TOTAL_CHAOS_DURATION in pod-dns experiments

  • Fixes a regression introduced in 1.13.6 wherein the experiment expected the parent workloads (deployment, statefulset et al) to carry labels specified in appinfo.applabel, apart from just the pods even when .spec.annotationCheck was set to false in the ChaosEngine. Prior to this, the parent workloads needed to have the label only when .spec.annotationCheck was set to true. This has been re-corrected as per earlier expectations.

Limitations

  • Chaos abort (via .spec.engineState set to stop OR via chaosengine deletion) operation is known to have an issue with the namespace scoped chaos-operator in 1.13.8, i.e., an operator running with WATCH_NAMESPACE env set to a specific value and using role permissions. In such cases, the finalizer on the ChaosEngine needs to be removed manually and the resource deleted to ensure the operator functions properly.

    This is not needed/necessary for cluster scoped operators (which is the default mode of usage)(where WATCH_NAMESPACE env is set to empty string to cover all ns & leverages clusterrole permissions.)

    The fix for correcting the behavior of namespace scoped operators will be added in the next patch.

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.8.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

1.13.6

15 Jun 21:35
dc086b3
Compare
Choose a tag to compare

New Features & Enhancements

  • Supports automated rollback/abort of chaos depending upon predefined conditions (defined in the probes). The probes can now be configured with a StopOnFailure property set to true or false to control the execution flow of the experiment.

  • Enhances the ChaosResult status schema to provide details of (a) the target resource impacted (b) success of the chaos revert operation.

  • Introduces additional labels for the “interleaved” chaos metrics (litmus_awaited_experiments & litmus_experiment_verdict) to indicate workflow name & chaos injection timestamp. This is expected to help in the construction of more meaningful dashboards to track app behavior under chaos.

  • Adds the golang chaoslib and experiment logic for docker-service-kill (from ansible)

  • Introduces the tech-preview of a new category (aws-ssm) of chaos experiments that can inject common resource and network chaos in EC2 instances (which is part of a kubernetes cluster or a standalone/vanilla instance).

  • Introduces the tech-preview of refactored pod-cpu-hog & pod-memory-hog chaos experiments that can inject resource chaos on target apps externally (non-exec mode) via cgroup operations.

  • Improves/dockerizes the build process for most components (removes vendor packages stored on the repo and migrates to github workflows)

  • Reduces the size of the experiment (go-runner) image by creating a single chaos helper component that takes specific chaos operations as flags

  • Extends the StatusCheckTimeout property to the helper pods (earlier releases had this only for pre/post chaos checks), thereby helping the flexible evaluation of application availability/readiness during the chaos

  • Adds a new event for “Abort” on the ChaosResult

  • Increases coverage in the commit-based e2e runs on the litmus-go repo with the addition of node chaos tests

  • Adds a new helm chart for kube-aws (chaos experiment bundle) in the litmus-helm repository.

  • Enhances the litmus-sdk to (a) create a highly generic experiment scaffolding that can trigger and kill chaos via shell commands passed as environment variables (change from an earlier sample of pod-delete) and (b) push all non-code files (CR yamls) into a dedicated directory that can be directly copied/committed to the chaos-charts repo.

  • Cuts the first tagged release on the test-tools repository and sets up downloadable artifacts for the dependent chaos utils (nsutil, pauseutil, promql, dns-interceptor).

Major Bug Fixes

  • Adds missing environment variables for kill sequence and pod affected percentage in the kafka-broker-pod-failure experiment

  • Fixes the missing environment variable for defining the spoof map within the dns-spoof experiment.

  • Fixes the ChaosScheduler to work with the latest versions of the chaos-operator and updates documentation with missing mandatory properties in the .spec.engineTemplate

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.6.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

1.13.5

15 May 18:30
dc086b3
Compare
Choose a tag to compare

New Features & Enhancements

  • Introduces category for VMWare chaos with VM power-off experiment (supported for vCenter 6.x)

  • Adds chaos experiments for simulating DNS errors (inability to resolve hosts) and redirection to incorrect/faulty services (using a spoof map that can redirect specific requests)

  • Makes the chaos annotationCheck against applications “false” by default, making it simpler for users to get started with chaos without any instrumentation step for the application targets.

  • Updates the CRD version to v1, the min. supported Kubernetes version moved to 1.15

  • Enhances the disk fill experiment with a tunable to specify write block size for quicker capacity use and fs aligned writes.

  • Supports label-based selection of node targets for (node-level) chaos injection.

  • Adds chaos abort routines for AWS chaos experiments

  • Adds the ability to target EBS volumes by tag, with a sequential and parallel injection of chaos, with support for both simple as well as EKS persistent volumes.

  • Places non-litmus core images (dependencies such as argo, MongoDB for portal driven chaos) into litmuschaos image registry, while maintaining image names and release tags to simplify the user experience for those who need to set up local mirrors or are in air-gapped environments

  • Adds support for Openshift Route in the litmus helm charts

  • Refactors and optimizes chaos libraries for code reuse and simplified flow. Updates the litmus-sdk to generate refactored experiment templates

  • Adds GitHub actions based workflow/pipeline for node-level chaos experiments in e2e suite

Major Bug Fixes

  • Fixes the inability to define certain attributes within the ChaosEngines, for which the OpenAPI validation was missing (due to migration of CRD version to v1) using the “preserve-unknown-fields” option. Also adds the validations for a number of properties/attributes.
    Fixes a panic encountered in the chaos-runner upon the inability to access the ChaosEngine resource

  • Fixes the node restart experiment to perform the right verification checks on helper pods executing the chaos
    Fixes behavior where helper pods that complete quickly (run for short durations) are treated as failed by verifying for “succeeded” state.

  • Removes ambiguity in filtering/accessing helper pods by assigning standard label format

  • Fixes an erroneous decision in pod-cpu & memory hog experiments which considered a non-zero response (137) upon chaos process kill (SIGKILL) as failure to revert/rollback

  • Adds a check to verify the status of application target containers before attempting an exec operation to perform the desired chaos action

  • Fixes the ec2-terminate-by-tag experiment to consider only the running instances for stop/termination

  • Adds the missing PORTAL_ENDPOINT environment to facilitate namespaced mode of execution of the litmus-portal

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.5.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs