Releases: digital-asset/canton
canton v2.8.5
Release of Canton 2.8.5
Canton 2.8.5 has been released on April 26, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release, containing a bugfix. Users are advised to upgrade during their maintenance window.
Bugfixes
(24-008, Major): Deadlock in Topology Dispatcher
Issue Description
When a topology change is sent to the sequencer and there is a problem with the transmission or sequencing, the topology dispatcher waits forever to receive back the corresponding topology transaction. Later topology changes queue behind this outstanding topology change until the node is restarted.
Affected Deployments
Participant, Domain and Domain Topology Manager nodes
Affected Versions
All 2.3-2.7 2.8.0-2.8.4
Impact
The node cannot issue topology changes anymore.
Symptom
Varies with the affected topology transaction: Party allocations time out, uploaded packages cannot be used in transactions (NO_DOMAIN_FOR_SUBMISSION), cryptographic keys can not be changed, etc.
The failure of sequencing can be seen in DEBUG logging on the node:
DEBUG c.d.c.t.StoreBasedDomainOutbox:... tid:1da8f7fff488dad2fc4c9c0177633a7e - Attempting to push .. topology transactions to Domain ...
DEBUG c.d.c.s.c.t.GrpcSequencerClientTransport:... tid:1da8f7fff488dad2fc4c9c0177633a7e - Sending request send-async-versioned/f77dd135-9c6a-4bd8-a6ed-a9a9f3ef43ca to sequencer.
DEBUG c.d.c.s.c.SendTracker:... tid:1da8f7fff488dad2fc4c9c0177633a7e - Sequencer send [f77dd135-9c6a-4bd8-a6ed-a9a9f3ef43ca] has timed out at ...
where the last log line by default comes 5 minutes after the first two.
Workaround
Restart the node.
Likeliness
Occurs for unstable network conditions between the node and the sequencer (e.g., frequent termination of a subset of the connections by firewalls) and when the sequencer silently drops submission requests.
Recommendation
Upgrade to 2.8.5
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.70+15-CA (build 11.0.22+7-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.18 (Debian 12.18-1.pgdg120+2) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.14 (Debian 13.14-1.pgdg120+2), PostgreSQL 14.11 (Debian 14.11-1.pgdg120+2), PostgreSQL 15.6 (Debian 15.6-1.pgdg120+2) |
Oracle | 19.20.0 |
canton v2.8.4
Release of Canton 2.8.4
Canton 2.8.4 has been released on April 16, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release containing small improvements and bug fixes.
What’s New
Node's Exit on Fatal Failures
When a node encounters a fatal failure that Canton unexpectedly cannot handle gracefully,
the new default behavior is that the node will exit/stop the process and relies on an external process or service monitor to restart the node's process. Without this exit, the
node would remain in a half-broken state, requiring a manual restart.
Failures that are currently logged as an error but not automatically recovered from are:
- Unhandled exceptions when processing events from a domain, which currently leads to a disconnect from that domain.
- Failed transition from an active replica to a passive replica, which may result in an invalid state of the node.
The new behavior can be reverted by setting: canton.parameters.exit-on-fatal-failures = false
in the configuration, but
we encourage users to adopt the new behaviour.
Minor Improvements
Error code changes
- When an access token expires and ledger api stream is terminated an
ABORTED(ACCESS_TOKEN_EXPIRED)
error is returned.
Fixed error cause truncation
Before, error causes of the documented errors were truncated after 512 characters. This behaviour is necessary when transporting errors through GRPC,
as GRPC imposes size limits, but the limit was also applied to errors in logs, which caused relevant information to be truncated.
Now, the 512 bytes limit is not applied to the errors written to the logs anymore.
Configuration Changes
Token Expiry Grace Period for Streams
When a token used in the ledger api request to open a stream expires, the stream is terminated. This normally happens
several minutes or hours after the stream initiation.
Users can now configure a grace period that will protect the stream termination beyond the token expiry:
canton.participants.participant1.parameters.ledger-api-server-parameters.token-expiry-grace-period-for-streams=600.seconds
Grace period can be any non-negative duration where both the value and the units must be defined e.g. "600.seconds" or "10.minutes".
When parameter is omitted, grace period defaults to zero. When the configured value is Inf
the stream is never terminated.
(24-007, Major): Domain reconnect does not complete timely during participant failover
Issue Description
During participant failover the newly active participant does not complete reconnecting to the domain and fail to declare itself as active in a timely manner.
This can happen when the passive replica, which has become active, has been idle for a longer time while there has been many transactions processed by the former active replica.
Affected Deployments
Participant
Affected Versions
2.3-2.6,
2.7.0-2.7.7
2.8.0-2.8.3
Impact
During participant failover the newly active participant does not become active and does not serve commands and transactions in a timely manner.
Symptom
"One of the last log entries before the participant only outputs storage health checks anymore is like:
INFO c.d.c.p.s.d.DbMultiDomainEventLog:participant=participant1 Fetch unpublished from domain::122345... up to Some(1234)
The participant reports itself as not active and not connected to any domains in its health status."
Workaround
Restart the participant node.
Likeliness
It depends on how long the passive replica has been idle, the number and size of transactions that the active replica has processed in the meantime.
These factors influence how timely the failover and in particular reconnect to the domain completes.
Recommendation
Upgrade to 2.8.4
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.70+15-CA (build 11.0.22+7-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.18 (Debian 12.18-1.pgdg120+2) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.14 (Debian 13.14-1.pgdg120+2), PostgreSQL 14.11 (Debian 14.11-1.pgdg120+2), PostgreSQL 15.6 (Debian 15.6-1.pgdg120+2) |
Oracle | 19.20.0 |
canton v2.3.19
Release of Canton 2.3.19
Canton 2.3.19 has been released on March 28, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
Canton 2.3.19 has been released as part of Daml 2.3.19 with no additional change on Canton
Compatibility
The following Canton protocol and Ethereum sequencer contract versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 2.0.0, 3.0.0 |
Ethereum contract versions | 1.0.0, 1.0.1 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing) |
Postgres | postgres (PostgreSQL) 14.11 (Debian 14.11-1.pgdg120+2) |
Oracle | 19.15.0 |
Besu | besu/v21.10.9/linux-x86_64/openjdk-java-11 |
Fabric | 2.2.2 |
canton v2.7.9
Release of Canton 2.7.9
Canton 2.7.9 has been released on March 20, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release, providing a workaround for a participant crash recovery issue on protocol version 4.
Bugfixes
This release provides a workaround for a specific participant crash recovery issue under load on protocol version 4 triggered by the bug 23-021 (fixed in protocol version 5).
The workaround can be enabled for a participant through the following configuration, but should only be done when advised so by Digital Asset:
canton.participants.XXX.parameters.disable-duplicate-contract-check = yes
.
Compatibility
The following Canton protocol and Ethereum sequencer contract versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.70+15-CA (build 11.0.22+7-LTS, mixed mode) |
Postgres | postgres (PostgreSQL) 14.11 (Debian 14.11-1.pgdg120+2) |
Oracle | 19.18.0 |
canton v2.7.8
Update 2024-03-12.23 (#144) Reference commit: 3f22ca358d Co-authored-by: Canton <canton@digitalasset.com>
canton v2.3.18
Release of Canton 2.3.18
Canton 2.3.18 has been released on March 15, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
Canton 2.3.18 has been released as part of Daml 2.3.18 with no additional change on Canton.
Compatibility
The following Canton protocol and Ethereum sequencer contract versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 2.0.0, 3.0.0 |
Ethereum contract versions | 1.0.0, 1.0.1 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing) |
Postgres | postgres (PostgreSQL) 14.11 (Debian 14.11-1.pgdg120+2) |
Oracle | 19.15.0 |
Besu | besu/v21.10.9/linux-x86_64/openjdk-java-11 |
Fabric | 2.2.2 |
canton v2.8.3
Release of Canton 2.8.3
Canton 2.8.3 has been released on February 21, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release, fixing critical issue which can occur if overly large transactions are submitted to the participant.
The bootstrap domain command has been slightly improved, but it may now throw an error if you try to bootstrap an already
initialized domain while the domain manager node is still starting up.
Console changes
Console commands that allow to download an ACS snapshot now take a new mandatory argument to indicate whether
the snapshot will be used in the context of a party offboarding (party replication or not). This allows Canton to
performance additional checks and makes party offboarding safer.
Affected console commands:
participant.repair.export_acs
participant.repair_download_acs
(deprecated method)
New argument: partiesOffboarding: Boolean
.
Bugfixes
(24-003, Moderate): Cannot exercise keyed contract after party replication
Issue Description
When replicating a party who is a maintainer on a contract key, the contract cannot be exercised anymore
Affected Deployments
Participant
Affected Versions
- 2.7
- 2.8.0-2.8.1
Impact
Contract become unusable
Symptom
The following error is emitted when trying to exercise a choice on the contract:
java.lang.IllegalStateException:
Unknown keys are to be reassigned. Either the persisted ledger state corrupted
or this is a malformed transaction. Unknown keys:
Workaround
None.
Likeliness
Deterministic.
Recommendation
Do not use the party migration macros with contract keys in version 2.8.1. Upgrade to 2.8.2 if you want to use them.
(24-002, Critical): Looping DB errors for transactions with many (32k) key updates
Issue Description
If a transaction with a very large number of key updates is submitted to the participant, the SQL query to update the contract keys table will fail, leading to a database retry loop, stopping transaction processing. The issue is caused by a limit of 32k rows in a prepared statement in the JDBC Postgres driver.
Affected Deployments
Participant on Postgres
Affected Versions
2.3-2.6,
2.7.0-2.7.6
2.8.0-2.8.1
Impact
Transactions can be submitted to the participant, but the transaction stream stops emitting transactions and commands don't appear to succeed.
Symptom
A stuck participant, continuously logging about "Now retrying operation 'com.digitalasset.canton.participant.store.db.DbContractKeyJournal.addKeyStateUpdates'" and "Tried to send an out-of-range integer as a 2-byte value: 40225"
Workaround
Upgrade to a version containing the fix. Alternatively, such a transaction may be ignored, but this must be done with care to avoid a ledger fork in case of multiple involved participants. Contact support.
Likeliness
Deterministic with very large (32k key updates) transactions.
Recommendation
Upgrade at your convenience or if experiencing the error.
(24-004, Major): DB write operations fail due to read-only connections as part of a DB failover
Issue Description
Given a Postgres HA setup with a write and read replica using AWS RDS. During DB failover, Canton connects and remains connected to the former write replica, which results in the DB connections being read-only. This fails all write DB operations on which we retry indefinitely.
Affected Deployments
All nodes with Postgres HA. Only observed with AWS RDS so far, not on Azure.
Affected Versions
All 2.3-2.6
2.7.0-2.7.6
2.8.0-2.8.1
Impact
A node becomes unusable due to all DB write operations failing and being retried indefinitely.
Symptom
The following kind of exceptions are logged:
org.postgresql.util.PSQLException: ERROR: cannot execute UPDATE in a read-only transaction
Workaround
Restart the node to reconnect to the current write replica. Set DNS TTL in JVM networkaddress.cache.ttl=0
reduces the likelihood of this problem, but disables DNS caching entirely
Likeliness
Probable in case of DB failover, only observed in AWS RDS so far.
Recommendation
Upgrade to 2.8.2 if you are using AWS RDS
(24-005, Major): Race condition during domain manager startup may prevent startup after upgrade
Issue Description
A race condition in the domain bootstrap macro and domain manager node initialisation may result in a domain manager overriding the previously stored domain parameters. This can cause the domain manager to be unable to reconnect to the domain after upgrading to version 2.7 or beyond without a hard domain migration to pv=5, as with 2.7, the default protocol version changed to pv=5.
Affected Deployments
Domain Manager Nodes
Impact
The domain manager node is unable to connect to the sequencer. Topology transactions such as party additions have no effect on the domain, as they are not verified and forwarded by the domain manager.
Symptom
Parties added to the participant do not appear on the domain. Transactions referring to such parties are rejected with UNKNOWN_INFORMEE. The domain manager logs complain about "The unversioned subscribe endpoints must be used with protocol version 4"
Workaround
If you don't run bootstrap_domain after the domain was already initialised, the issue won't happen. If a node is affected, you can recover the node explicitly by seting the protocol version to v=4 in the configuration and use the reset-stored-static-config parameter to correct the wrongly stored parameters.
Likeliness
Race condition which can happen if you run the bootstrap_domain command repeatedly on an already bootstrapped domain during startup of the domain manager node. As there is no need to run the bootstrap_domain command repeatedly, this issue can easily be avoided.
Recommendation
Do not run the bootstrap_domain command after the domain was initialised.
A fix has been added to prevent future mistakes.
(24-006, Minor): Large error returned from ACS query cannot be stuffed into HTTP2 headers
Issue Description
A bad ledger api client request results in an error that contains a large metadata resource list and fails the serialization in the netty layer. As a consequence what is sent to the client is an INTERNAL
error that doesn't correspond to the actual problem.
Affected Deployments
Participant Nodes
Impact
Incorrect error returned to the ledger client which may complicate trouble-shooting
Symptom
Ledger client observes following error:
INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
At the same time the participant node reports following error:
io.grpc.netty.NettyServerHandler - Stream Error
io.netty.handler.codec.http2.Http2Exception$HeaderListSizeException: Header size exceeded max allowed size (8192)
Workaround
Rely on participant logs when investigating ledger client issues with logs containing the message
INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
Likeliness
It is rather difficult for ledger api clients to create bad requests that will end up in errors with long list of resources. One example is when client requests a long list of non existing templates in a filter of a ACS ledger api request.
There was an unrelated bug in the trigger service that could produce long list of templates when AllInDar construction was used in the trigger installation invocation.
Recommendation
Upgrade at your convenience.
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.66+15-CA (build 11.0.20+8-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.18 (Debian 12.18-1.pgdg120+2) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.14 (Debian 13.14-1.pgdg120+2), PostgreSQL 14.11 (Debian 14.11-1.pgdg120+2), PostgreSQL 15.6 (Debian 15.6-1.pgdg120+2) |
Oracle | 19.20.0 |
canton v2.7.7
Update 2024-02-14.08 (#133) Reference commit: bb64a1d2e9 Co-authored-by: Canton <canton@digitalasset.com>
canton v2.8.1
Release of Canton 2.8.1
Canton 2.8.1 has been released on January 31, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release of Canton including reliability fixes and user experience improvements.
Users of the Fabric driver are encouraged to upgrade at their disposal.
What’s New
Minor Improvements
Improved reference configuration example
We have reworked the reference configuration example. The examples/03_advanced_configuration
example has been replaced
by a set of reference configuration files which can be found in the config
directory. While the previous example
contained pieces which could be assembled into a full configuration, the new examples now contain the full configuration
which can be simplified by removing parts which are not needed.
The installation guide <https://docs.daml.com/canton/usermanual/installation.html>
_ has been updated accordingly.
Improved Party Replication Macros
The enterprise version now supports replicating a party from one participant node to another for either migration or
to have multiple participants hosting the same party. Please consult the documentation <https://docs.daml.com/2.9.0/canton/usermanual/identity_management.html#replicate-party-to-another-participant-node>
_
on how to use this feature.
Reduced Background Journal Cleaning Load
We have improved the background journal cleaning to produce less load on the database by using smaller transactions to clean up
the journal.
Executor Service Metrics removed
The metrics for the execution services have been removed:
- daml.executor.runtime.completed*
- daml.executor.runtime.duration*
- daml.executor.runtime.idle*
- daml.executor.runtime.running*
- daml.executor.runtime.submitted*
- daml_executor_pool_size
- daml_executor_pool_core
- daml_executor_pool_max
- daml_executor_pool_largest
- daml_executor_threads_active
- daml_executor_threads_running
- daml_executor_tasks_queued
- daml_executor_tasks_executing_queued
- daml_executor_tasks_stolen
- daml_executor_tasks_submitted
- daml_executor_tasks_completed
- daml_executor_tasks_queue_remaining
Bugfixes
(24-001, Major): Fabric Block Sequencer may deadlock when catching up
Issue Description
Fabric Ledger block processing runs into an unintended shutdown and fails to process blocks when the
in-memory blocks exceeds 5000. This is caused when catching up after downtime and the Fabric Ledger
size has increased substantially in the meantime.
Affected Deployments
Fabric Sequencer Node
Affected Versions
2.3 - 2.7 and 2.8.0
Impact
The sequencer stops feeding new blocks.
Symptom
Participant gets stuck in an old state and does not visibly catch up against a Fabric Ledger.
Therefore, a domains.reconnect
call on the participant may appear as if it is hanging.
On the sequencer node, each processed message is logged using Observed Send with messageId
including a
timestamp. Once the emission of these log lines stops, the sequencer is stuck.
Workaround
Restart the sequencer node if it is stuck.
Likeliness
Rarely, only occurs when catching up to a Fabric Ledger that the sequencer node has been an active part of before.
Recommendation
Upgrade to 2.8.1 at your convenience.
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.66+15-CA (build 11.0.20+8-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.17 (Debian 12.17-1.pgdg120+1) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.13 (Debian 13.13-1.pgdg120+1), PostgreSQL 14.10 (Debian 14.10-1.pgdg120+1), PostgreSQL 15.5 (Debian 15.5-1.pgdg120+1) |
Oracle | 19.20.0 |
canton v2.8.0-rc4
Release candidates such as 2.8.0-rc4 don't come with release notes