Skip to content

Releases: oracle/coherence-operator

v3.1.2

27 Jan 13:59
Compare
Choose a tag to compare

Coherence Operator Release 3.1.2

This is a minor bug fix release.

Changes

  • The Operator now looks at multiple Node labels to determine the site and rack values for a Coherence member.
    Previously a single label was used, which may not work on older k8s versions that still used deprecated topology labels.
    This would cause Coherence services to fail to reach site safe.

  • Cleaned up confusion around the multiple different Grafana dashboards available.

  • Added Grafana Dashboards that support the Coherence Micrometer Prometheus metric name format.
    These dashboards are for use with applications that use the Coherence Micrometer integration released with Coherence CE 20.12

  • Added a troubleshooting FAQ to the documentation. This is intended to be a growing guide to troubleshooting deploying Coherence clusters in k8s.

  • The default readiness probe no longer waits for DefaultCacheServer to start. This can optionally be re-enabled with a system property.
    This feature was originally added for a customer where the first readiness probe was executing too early but is not required by most applications. It is simpler to adjust the readiness probe timings. Waiting for DefaultCacheServer will not always work especially if
    using the new bootstrap API released with Coherence CE 20.12.

  • Cleaned up some documentation errors

v3.1.1

13 Oct 09:19
Compare
Choose a tag to compare

Coherence Operator Release 3.1.1

Today we have release 3.1.1 of the Coherence Operator. This contains a single bug fix on top of 3.1.0, albeit a very important one.

⚠️ Deprecation of 3.1.0 ⚠️

We are deprecating 3.1.0 which should not be used due to breaks in compatibility with previous 3.0.x versions.

Changes

An issue came to light soon after the release of v3.1.0 where the CRD name had changed slightly and subtly from coherence.cohrence.oracle.com to coherences.coherence.oracle.com which was enough to break transparent upgrades from previous 3.0.x versions to 3.1.0. Initially we thought that the work-around of manually deleting the previous 3.0.x CRD would be sufficient. It soon became clear that this was totally impractical as deleting a CRD causes all of the Coherence deployments creadted from the CRD to also be deleted, again breaking transparent upgrades.

For that reason we have changed the CRD name back to coherence.cohrence.oracle.com in version 3.1.1, obviously making it incompatible with 3.1.0 but compatible with 3.0.x. We recommend customers completely skip 3.1.0 and upgrade to 3.1.1. If you have installed 3.1.0 then you must manually delete the coherences.cohrence.oracle.com CRD before installing 3.1.1 which will again delete any clusters that are running.

Version 3.1.1 is backwards compatible and with 3.0.x and installation of 3.1.1 will not affect clusters already running from a previous 3.0.x release, just uninstall 3.0.x and install 3.1.1. We now have tests in the CI build to verify this and hopefully stop this sort of issue occurring in future.

v3.1.0

28 Sep 09:57
Compare
Choose a tag to compare

Coherence Operator v3.1.0

🚫 THIS RELEASE IS DEPRECATED - DO NOT USE 🚫

⚠️ It appears that the upgrade to using Operator SDK 1.0.0 which now uses Kubebuilder to generate the CRDs has caused the
name of the CRD to change slightly from coherence.coherence.oracle.com to coherences.coherence.oracle.com (the first coherences is now plural). The work-around for this was to delete the existing CRD but that would cause all Coherence clusters that
had been deployed with the previous CRD to also be deleted. This is obviously totally impractical.

This version of the Coherence Operator is compatible with previous 3.0.* versions, there should have been no breaking changes and Coherence yaml used with 3.0.* versions should work with 3.1.0.

Changes in Operator 3.1.0

Project Restructure

The biggest change from our perspective was the move to the final 1.0.0 release of the Operator SDK. Just before that release the Operator SDK team made big changes to their project, removing a lot of things and basically switching to using Kubebuilder for a lot of the code generation and configuration. The meant that we had to do a bit of reorganization of the code and project layout. The Operator SDK also removed its test framework, which we had made extensive use of in our suite of end-to-end integration tests. Some things became simpler with using Kubebuilder, but we still had to do work to refactor our tests. This is all of course transparent to Coherence Operator users, but was a sizeable piece of work for us.

Deployment

The change to using Kubebuilder, and using the features it provides, has meant that we have changed the default deployment options of the Coherence Operator. The recommended way to deploy the Coherence Operator with 3.1 is to deploy a single instance of the operator into a Kubernetes cluster and that instance monitors and manages Coherence resources in all namespaces. This is a change from previous versions where an instance of the operator was deployed into a namespace and only monitored that single namespace, meaning multiple instances of the operator could be deployed into a Kubernetes cluster.
There are various reasons why the new model is a better approach. The Coherence CRDs are deployed (or updated) by the Operator when it starts. In Kubernetes a CRD is a cluster scoped resource, so there can only be a single instance of any version of a CRD. We do not update the version of our CRD with every Operator release - we are currently at v1. This means that if two different versions of the Coherence Operator had been deployed into a Kubernetes cluster the version of the CRD deployed would only match one of the operators (typically the last one deployed) and this could lead to subtle bugs or issues due to version mis-matches. The second reason is due to version 3.1 of the operator introducing admission web-hooks (more on that below). Like CRDs, admission web-hooks are also really a cluster scoped resource so having multiple web-hooks deployed for a single CRD may cause issues.
It is possible to deploy the Coherence Operator with a list of namespaces to monitor instead of monitoring all namespaces, and hence it is possible to deploy multiple operators monitoring different namespaces, we just would not advise this.

Admission Web-Hooks

Version 3.1 of the operator introduced the use of admission web-hooks. In Kubernetes an admission web-hook can be used for mutating a resource (typically applying defaults) and for validating a resource. The Coherence Operator uses both of these, we apply default values to some fields, and we also validate fields when a Coherence resource is created or updated. In previous versions of the operator it was possible to see issues caused by creating a Coherence resource with invalid values in some fields, for example altering a persistent volume when updating, setting invalid NodePort values, etc. In previous versions these errors were not detected until after the Coherence resource had been accepted by Kubernetes and a StatefulSet or Service was created and subsequently rejected by Kubernetes causing errors in the operators reconcile loop. With a validation web-hook a Coherence resource with invalid values will not even make it into Kubernetes.

Kubernetes Autoscaler

Back in version 3.0 of the operator we supported the scale sub-resource which allowed scaling of a Coherence deployment using built in Kubernetes scale commands, such as kubectl scale. In version 3.1 we have taken this further with a full end-to-end example of integrating a Coherence cluster into the Kubernetes Horizontal Pod Autoscaler and showing how to scale a cluster based on metrics produced by Coherence. This allows a Coherence cluster to grow as its resource requirements increase, for example as heap use increases. This is by no means an excuse not to do any capacity planning for you applications, but does offer a useful way to use your Kubernetes resources on demand.

Graceful Cluster Shutdown

As a resilient data store Coherence handles Pods leaving the cluster by recovering the lost data from backup and re-balancing the cluster. This is all great and exactly what we need but not necessarily when we actually just want to stop the whole cluster at once. Pods will not all die together, and those left will be working hard to recover as other Pods leave the cluster. If a Coherence resource is deleted from Kubernetes (or if it is scaled down to a replica count of zero) the Coherence Operator will now suspend all storage enabled cache services in that deployment before Pods are stopped. This allows for a more controlled cluster shut-down and subsequent recovery when brought back up.

Spring Boot Image Support

Spring Boot is a popular framework that we have big plans for in upcoming Coherence CE releases. One feature of Spring Boot is the way it packages an application into a jar, and then how Spring Boot builds images from the application jar. This could lead to problems trying to deploy those types of application using the Coherence Operator. The simplest way to package a Spring Boot application into an image for use by the Coherence Operator is to use JIB. The JIB Gradle or Maven plugins will properly package a Spring Boot application into an image that just works out of the box with the Coherence Operator.
Spring Boot images built using the latest Spring Boot Gradle or Maven plugins use Cloud Native Buildpacks to produce and image. The structure of these images and how they are run is quite different to a simple Java application. There are pros and cons with this, but as a popular framework and tooling it is important the Coherence Operator can manage Coherence applications built and packaged this way. With version 3.1 of the operator these images can be managed with the addition of one or two extra fields in the Coherence resource yaml.
Finally, if you really wish to put your Spring Boot fat-jar into an image (and there are reasons why this is not recommended) then the Coherence resource has configuration options that will allow this to work too.

Tested on Kubernetes 1.19

With the recent release of Kubernetes 1.19 we have added this to our certification test suite. We now test the Coherence Operator on all Kubernetes versions from 1.12 to 1.19 inclusive.

v3.0.2

12 Aug 09:30
Compare
Choose a tag to compare

This is a minor point release of the Coherence Operator.

Fixes

  • Fixed an issue where the Operator continually re-reconciled the StatefulSet for a Coherence deployment is persistence was enabled using PVCs.

Notes When Using Persistence or Configuring VolumeClaimTemplates

One of the limitations of a StatefulSet (which is used to control the Pods of a Coherence deployment) is that certain fields are effectively read-only once the StatefulSet has been created. One of these is the VolumeClaimTemplates array. This means that the Coherence Operator will not attempt to change a VolumeClaimTemplate for a StatefulSet once the StatefulSet has been created even if a change to a Coherence deployment yaml should have caused a change. For example enabling and then later disabling persistence will not cause the persistence VolumeClaimTemplate to be removed from the StatefulSet, and vice-versa, enabling persistence as an update to a running deployment will fail to add the VolumeClaimTemplate.

Images

Images can be pulled from Oracle Container Registry (credentials are not required to pull the Operator images).

docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.2
docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.2-utils

v3.0.1

20 Jul 16:08
Compare
Choose a tag to compare

This is a minor point release of the Coherence Operator.

Fixes

  • Fixed an issue where the Operator continually re-reconciled the StatefulSet for a Coherence deployment is persistence was enabled using PVCs.

Changes

Note: As of this release the Docker images for the Operator are no longer published to Docker Hub.
Images can now be pulled from Oracle Container Registry (credentials are not required to pull the Operator images).

docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.1
docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.1-utils

v3.0.0

22 Jun 17:13
Compare
Choose a tag to compare

Operator 3.0.0 is a significant new version of the Coherence Operator.

Docs: https://oracle.github.io/coherence-operator/docs/3.0.0

Changes

This release is very different to the previous 2.x release, with a new simpler CRD, and is not backwards compatible.
Version 3.0.0 and 2.x can co-exists in the same K8s cluster.

The concept of Clusters and Roles has gone and been replaced by a single Coherence CRD. When a Coherence cluster is made up of multiple roles then each of these is now deployed and managed as a separate Coherence resource in k8s.

The reason for a new major release was so that we could completely remove the internal use of the Operator SDK Helm controller and now controller and instead reconcile all of the k8s resources in our own controller. This give us a full control over what gets reconciled and how we perform updates and merges and makes maintaining backwards compatibility for future releases simpler.

There is a converter utility in the assets section of the release below that can convert v2 yaml to v3 yaml. The only caveat is that Operator 3.0.0 expects only a single image to be specified that contains both Coherence and any application code.
See the docs on creating applications https://oracle.github.io/coherence-operator/docs/3.0.0/#/applications/010_overview

The converter takes a single command line parameter, which is the name of the file to convert, and outputs the converted yaml to stdout.
For example:

converter my-cluster.yaml

v2.1.1

05 May 16:25
Compare
Choose a tag to compare

Changes

NOTE

If upgrading from earlier releases of the Operator into a k8s namespace where there are already existing Coherence clusters that are configured with Fluentd enabled then due to limitations in the way that the Operator uses Helm internally this release will cause a rolling upgrade of those existing cluster. Existing Coherence clusters that do not have Fluentd enabled will not be affected

v2.1.0

14 Feb 16:31
Compare
Choose a tag to compare

NOTE

We are aware that this version of the Operator contains changes to the CRD that make it incompatible with previous versions. What this means is that if version 2.1.0 of the Operator is installed into a k8s namespace that already contains CoherenceCluster instances deployed with a previous version this causes error messages in the Operator and the existing clusters can no longer be controlled by the Operator. The only solution is to remove and re-create the affected clusters.

New Features

v2.0.5

20 Dec 13:42
Compare
Choose a tag to compare

Fixes:

  • Disable Flight Recorder if the Coherence image is not running a HotSpot JVM

  • Issue #379 Coherence fails to start in OpenShift.
    Whilst this issue has been fixed there are still issues where the existing Coherence images fail on OpenShift due to file permissions in the image. The images have an oracle user that owns the Coherence installation directories but OpenShift runs the containers with a random user. This will likely be fixed in future Coherence images but in order to make existing images work:

  1. Ensure the anyuid security content is granted
  2. Ensure that WebLogic containers are annotated with openshift.io/scc: anyuid

For example, to update the OpenShift policy, use:

$ oc adm policy add-scc-to-user anyuid -z default

and add the openshift.io/scc annotation to the CoherenceCluster For example:

kind: CoherenceCluster
metadata:
  name: test-cluster
spec:
  annotations:
    openshift.io/scc: anyuid
roles:
  - role: storage

v2.0.3

13 Dec 16:33
Compare
Choose a tag to compare

Potential fix for issue #371