Local schema cache weighted invalidation based on schema size #2598

mattsewall-stripe · 2023-04-06T21:32:49Z

Summary

While running schema registry we have observed scenarios where large schemas that contain many versions do not get invalidated quickly enough from the local schema cache. (Similar to large objects in an avalanche rising to the surface, large schemas with many references stay in the cache for max time period.)
Rather than increase the total heap space of our schema registry nodes or decrease the total size of the cache we've solved this issue with weight based local schema cache invalidation. Our schemas vary greatly in size and some of them are very large. It’s not efficient for us to size heap to accommodate a full cache of very large schemas without OOMing.
No performance degradation has been observed in our schema registry instance, but would love to work through mainline benchmark strategies as needed. Our hypothesis for this reasoning is that our cache hit % has stayed constant between sized based cache and weight based cache.
Included in this pr is metrics to track the local schema cache in both size and hits + misses.
By default this feature is not enabled.

Motivation

We have observed scenarios operating the schema registry where during periods of heavy write traffic the leader node will OOM based on the local cache becoming too large. This write traffic is extremely spiky and contains large schemas that contain many versions (important for checking compatibility).
This change hopefully serves to provide operational clarity into managing the local schema cache + provide a fix for this failure scenario.

mattsewall-stripe · 2023-04-06T21:34:16Z

core/src/main/java/io/confluent/kafka/schemaregistry/rest/SchemaRegistryConfig.java

+   * <code>schema.cache.maximum.weight</code>
+   */
+  public static final String SCHEMA_CACHE_MAXIMUM_WEIGHT_CONFIG = "schema.cache.maximum.weight";
+  public static final int SCHEMA_CACHE_MAXIMUM_WEIGHT_DEFAULT = 1000000;


The sum of canonical schema length must be under 1,000,000 characters is what this variable means. From the guava cache docs there's a lot of fuzziness as to how things get invalidated within the cache itself (based on usage and % of contribution to weight total?) but we've set this limit such that large schemas should be getting invalidated over time. What this essentially means is that more smaller schemas will be cached (and that's good!) as we should be able to cache quite a few of them. No significant performance degradation has materialized from making this change.

rayokota

Thanks for the PR @mattsewall-stripe!

The build failed with a few checkstyle errors

mattsewall-stripe · 2023-04-13T21:54:25Z

core/src/main/java/io/confluent/kafka/schemaregistry/rest/SchemaRegistryConfig.java

+   * <code>schema.cache.use.weight</code>
+   */
+  public static final String SCHEMA_CACHE_USE_WEIGHT_CONFIG = "schema.cache.use.weight";
+  public static final boolean SCHEMA_CACHE_USE_WEIGHT_DEFAULT = false;


Turned off by default but in my opinion likely could be turned on by default.

mattsewall-stripe · 2023-04-13T21:55:19Z

@rayokota

Hey! I apologize for the delay but have built project locally and ran tests :)

mattsewall-stripe · 2023-05-03T18:18:01Z

Hi! @rayokota can you take another look? Coming around to this again

cla-assistant · 2023-09-25T16:22:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Local cache weight based implementation

594f815

mattsewall-stripe requested a review from a team as a code owner April 6, 2023 21:32

mattsewall-stripe commented Apr 6, 2023

View reviewed changes

rayokota previously approved these changes Apr 6, 2023

View reviewed changes

Build changes

722fd35

mattsewall-stripe commented Apr 13, 2023

View reviewed changes

mattsewall-stripe requested a review from rayokota April 14, 2023 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local schema cache weighted invalidation based on schema size #2598

Local schema cache weighted invalidation based on schema size #2598

mattsewall-stripe commented Apr 6, 2023

mattsewall-stripe Apr 6, 2023

rayokota left a comment •

edited

mattsewall-stripe Apr 13, 2023 •

edited

mattsewall-stripe commented Apr 13, 2023

mattsewall-stripe commented May 3, 2023

cla-assistant bot commented Sep 25, 2023

Local schema cache weighted invalidation based on schema size #2598

Are you sure you want to change the base?

Local schema cache weighted invalidation based on schema size #2598

Conversation

mattsewall-stripe commented Apr 6, 2023

Summary

Motivation

mattsewall-stripe Apr 6, 2023

Choose a reason for hiding this comment

rayokota left a comment • edited

Choose a reason for hiding this comment

mattsewall-stripe Apr 13, 2023 • edited

Choose a reason for hiding this comment

mattsewall-stripe commented Apr 13, 2023

mattsewall-stripe commented May 3, 2023

cla-assistant bot commented Sep 25, 2023

rayokota left a comment •

edited

mattsewall-stripe Apr 13, 2023 •

edited