Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] provide clarity on the cache that flyte propeller uses #4611

Closed
2 tasks done
samhita-alla opened this issue Dec 18, 2023 · 1 comment · Fixed by #5278
Closed
2 tasks done

[Docs] provide clarity on the cache that flyte propeller uses #4611

samhita-alla opened this issue Dec 18, 2023 · 1 comment · Fixed by #5278
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow. documentation Improvements or additions to documentation pri3

Comments

@samhita-alla
Copy link
Contributor

Description

reference slack thread: https://discuss.flyte.org/t/16095263/please-could-someone-explain-which-cache-the-docs-are-referr#a02e379e-89dd-4fd9-ab49-df6b29bbcb65

doc page to update: https://docs.flyte.org/en/latest/deployment/configuration/performance.html#optimize-flytepropeller-configuration

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@samhita-alla samhita-alla added documentation Improvements or additions to documentation pri3 labels Dec 18, 2023
Copy link

dosubot bot commented Dec 18, 2023

🤖

Hey @samhita-alla! Nice to meet you! I'm Dosu, a friendly bot here to assist you while we wait for a human maintainer to step in. I'm designed to help you solve bugs, answer questions, and generally make your experience smoother. Don't worry, we'll look into this caching issue together. Let's get started!

The cache that FlytePropeller uses is a write-through cache that sits on top of the metastore. It is used to optimize read and write latency for all metadata IO operations. Metadata here refers to the input and output pointers. The cache can be configured in the storage.cache section of the FlytePropeller configuration.

The cache is also used to minimize the number of redundant loops in FlytePropeller, thus improving free slots. This is done by using a trick in etcD where for every write, FlytePropeller records the last known version number in the database and then tries to wait for the change to propagate to the informer cache.

Another area of slowdown could be the size of the input-output cache that FlytePropeller maintains in-memory. This can be configured while configuring the storage for FlytePropeller. The rule of thumb is to allocate half of the memory limit of FlytePropeller to the cache.

Here is the relevant excerpt from the documentation:

   * - ``storage.cache``
     - propeller
     - This config is used to configure the write-through cache used by FlytePropeller on top of the metastore
     - FlytePropeller uses the configure blob-store (can be changed to something more performant in the future) to optimize read and write latency, for all metadata IO operations. Metadata refers to the input and output pointers

And:

Another area of slowdown could be the size of the input-output cache that FlytePropeller maintains in-memory. This can be configured, while configuring
the storage for FlytePropeller. Rule of thumb, for FlytePropeller with x memory limit, allocate x/2 to the cache

Sources:

The write-through cache mechanism in FlytePropeller is implemented in the WriteRaw function of the cachedRawStore struct. Here's how it works:

  1. The function first creates a TeeReader from the io.Reader input. This TeeReader simultaneously reads from the raw input and writes to a bytes.Buffer named buf.
var buf bytes.Buffer
teeReader := io.TeeReader(raw, &buf)
  1. The function then writes the data from the TeeReader to the underlying raw store. This is the "write-through" part of the cache, where data is written to the underlying storage and the cache at the same time.
err := s.RawStore.WriteRaw(ctx, reference, size, opts, teeReader)
if err != nil {
	return err
}
  1. After the data is written to the raw store, it is also written to the cache using the Set method. The Set method takes the reference as the key and the bytes from the buf as the value. The neverExpire constant is used to indicate that the cache entry should never expire.
err = s.cache.Set([]byte(reference), buf.Bytes(), neverExpire)
if err != nil {
	s.metrics.CacheWriteError.Inc()
	err = errors.Wrapf(ErrFailedToWriteCache, err, "Failed to Cache the metadata")
}
  1. If there is an error while writing to the cache, it is logged and the CacheWriteError metric is incremented.

This mechanism ensures that the data is always written to the underlying storage and the cache simultaneously, which can improve read performance for frequently accessed data.
Sources:

I hope this provides the clarity you were looking for. If you have any more questions, feel free to ask.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@neverett neverett added the backlogged For internal use. Reserved for contributor team workflow. label Jan 11, 2024
@davidmirror-ops davidmirror-ops self-assigned this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. documentation Improvements or additions to documentation pri3
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants