Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Consider supporting size management for OCI Store #472

Open
akashsinghal opened this issue Mar 24, 2023 · 15 comments
Open
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@akashsinghal
Copy link

Currently, there isn't built in functionality for deleting and managing OCI Store size. It would be nice if there was a garbage collection-like functionality to evict LRU blobs from the index and delete the corresponding blobs from the OCI store

@akashsinghal
Copy link
Author

cc: @Wwwsylvia

@shizhMSFT shizhMSFT added the duplicate This issue or pull request already exists label Mar 27, 2023
@shizhMSFT
Copy link
Contributor

Duplicate of #454

@shizhMSFT shizhMSFT marked this as a duplicate of #454 Mar 27, 2023
@akashsinghal
Copy link
Author

Is there a plan to implement higher level function beyond Delete? Such as a way to add a time last accessed annotation in the index.json and an accompanying lastused descriptor method? Basically an API to make it easier to implement an LRU eviction strategy.

@shizhMSFT
Copy link
Contributor

add a time last accessed annotation in the index.json and an accompanying last used descriptor method?

That seems an auto purge functionality and is currently out of the scope since index.json might be immutable or supposed to be read-only.

@sparr
Copy link
Contributor

sparr commented Jun 20, 2023

While considering similar concerns in our code base, we considered using containerd's content store package, which offers garbage collection controlled by labels.

Having support for some similar functionality in oras would be great, but at the moment I'm more interested in getting #470 so we can delete at all. Once it supports delete and updating the predecessor/successor graph, then we can at least identify unreferenced/orphaned resources.

@shizhMSFT shizhMSFT added this to the v2.3.0 milestone Jun 27, 2023
@yizha1 yizha1 modified the milestones: v2.3.0, v2.4.0 Aug 1, 2023
@shizhMSFT shizhMSFT added enhancement New feature or request and removed duplicate This issue or pull request already exists labels Oct 26, 2023
@shizhMSFT
Copy link
Contributor

shizhMSFT commented Oct 26, 2023

Removing the duplicate label as this issue request Garbage Collection in addition to #454

@shizhMSFT shizhMSFT marked this as not a duplicate of #454 Oct 26, 2023
@shizhMSFT
Copy link
Contributor

cc: @wangxiaoxuan273 @eiffel-fl

@wangxiaoxuan273
Copy link
Contributor

wangxiaoxuan273 commented Nov 29, 2023

Let's have some discussions on the following general questions, which choices work better with actual needs?

  1. Do we want to implement auto-garbage collection in Delete(), or provide a public GC function and let user call it manually?
  2. Do we need to clean garbage recursively? For example given the graph A -> B -> C -> D, when A is deleted, should we delete B, C, D or only B?

You are welcome to take a look at the draft PRs #652, #653 and give feedbacks on the design. These two PRs collectively implement automatic recursive garbage collection upon oci.Store.Delete().

@eiffel-fl
Copy link
Contributor

Hi!

Let's have some discussions on the following general questions, which choices work better with actual needs?

1. Do we want to implement auto-garbage collection in `Delete()`, or provide a public GC function and let user call it manually?

Automatic would be welcomed, but having a dedicated function as a first step is OK.

2. Do we need to clean garbage recursively? For example given the graph `A -> B -> C -> D`, when `A` is deleted, should we delete `B, C, D` or only `B`?

The garbage collection should be recursive, otherwise we will end up with wasted node (C and D in the first case).

You are welcome to take a look at the draft PRs #652, #653 and give feedbacks on the design. These two PRs collectively implement automatic recursive garbage collection upon oci.Store.Delete().

I took a look at your PRs and they look pretty fine.
I will test #653 in Inspektor Gadget.

Best regards.

@Wwwsylvia
Copy link
Member

  1. Do we want to implement auto-garbage collection in Delete(), or provide a public GC function and let user call it manually?

I think we can start with providing a public GC function. And then we can consider having a store option for enabling auto-GC. It could be somewhat similar to SaveIndex() and AutoSaveIndex.

  1. Do we need to clean garbage recursively? For example given the graph A -> B -> C -> D, when A is deleted, should we delete B, C, D or only B?

Yes, the idea of GC is to clean up all garbage (dangling nodes). If we delete only the direct successors (B), there will still be garbage remained (C and D), and that's just another implementation of delete, not GC.

@wangxiaoxuan273
Copy link
Contributor

wangxiaoxuan273 commented Nov 30, 2023

  1. Do we want to implement auto-garbage collection in Delete(), or provide a public GC function and let user call it manually?

I think we can start with providing a public GC function. And then we can consider having a store option for enabling auto-GC. It could be somewhat similar to SaveIndex() and AutoSaveIndex.

Then should the default value of GCEnabled be true or false? Which choice is better? @Wwwsylvia

@shizhMSFT
Copy link
Contributor

shizhMSFT commented Dec 29, 2023

#656 will enable oci.Store to remove any untraceable / dangling blobs, including manifest blobs and layer blobs.

However, this issue requests a higher level of garbage collection. That is predicate-based garbage collection where the predicate is based on access metadata (e.g. LRU mentioned in the issue description). In summary, it requires the following sub features to fulfill the request:

  1. A metadata database to track the access metadata
  2. A GC scheduler to trigger the GC
  3. GC feature itself, which is being implemented by feat: Public GC function of oci.Store #656

For 1, the metadata database can be in memory if it is within a single process, using a file if it is accessed by a single process at a time, or using a real DB if multi-process access is required. Here is an example implementation:

type MetadataStore struct {
    *oci.Store
    LastAccessTime sync.Map // map[digest.Digest]time.Time
}

func (s *MetadataStore) Push(ctx context.Context, expected ocispec.Descriptor, reader io.Reader) error {
    if err := s.Store.Push(ctx, expected, reader); err != nil {
        return err
    }
    if isManifest(expected) {
        s.LastAccessTime.Store(expected.Digest, time.Now())
    }
    return nil
}

func (s *MetadataStore) Fetch(ctx context.Context, target ocispec.Descriptor) (io.ReadCloser, error)
    if rc, err := s.Store.Fetch(ctx, target); err != nil {
        return nil, err
    }
    if isManifest(target) {
        s.LastAccessTime.Store(target.Digest, time.Now())
    }
    return rc, nil
}

func (s *MetadataStore) Delete(ctx context.Context, target ocispec.Descriptor) error {
    if err := s.Store.Delete(ctx, target); err != nil {
        return err
    }
    s.LastAccessTime.Delete(target.Digest)
    return nil
}

For 2, a GC scheduler can be implemented using a go-routine with a predicate. For example,

ms := GetMetadataStore()
ctx := context.Todo()
ticker := time.NewTicker(time.Minute * 10) // 10 mins
for range ticker.C {
    func() {
        ctx, cancel := context.WithTimeout(ctx, time.Second * 10) // stop the world for 10s for GC
        defer cancel()
        // remove those images which are not accessed in 1 day
        ms.LastAccessTime.Range(func(key, value any) {
            dgst := key.(digest.Digest)
            accessTime := value.(time.Time)
            if accessTime.Add(time.Hour * 24).Before(time.Now()) {
                ms.Delete(ctx, ocispec.Descriptor{Digest: dgst})
            }
        })
        ms.GC(ctx) // optional as ms.AutoGC is enabled by default
    }()
}

@akashsinghal We need your inputs / comments on this feature request as we are not sure if this feature request is still sync with your demand so that we can continue working on it or close when #656 is merged.

/cc @wangxiaoxuan273 @FeynmanZhou @yizha1

Wwwsylvia pushed a commit that referenced this issue Dec 29, 2023
Part of #472, this PR implements recursive GC for `oci.Store`. A field
`AutoGarbageCollection` of `oci.Store` is added, default value is true.

Signed-off-by: Xiaoxuan Wang <wangxiaoxuan119@gmail.com>
@akashsinghal
Copy link
Author

@shizhMSFT Thanks for laying out the functionality required here. We are on the same page. The LRU eviction with scheduled GC is exactly what the Ratify project is looking for. In our specific use case, it's multi process so the metadata store implementation would need to handle concurrent read/writes. We already have requirements of oci.Store for multi process support when reading/writing to the index file #286.

Adding GC support is a great start for us in #656. However we will need the metadata store and scheduling support for us to be able to effectively use GC. If oras-go can support this that would be great, however I'm not sure what level of functionality ORAS folks would like to expose versus require consumers (Ratify) to build and manage on top of ORAS on their own. Do you have any thoughts on this boundary?

If the metadata store was built in to oras-go and exposed, would it still be the consumer's responsibility to provide a metadata store implementation that fits their needs (single process vs multi concurrent processes DB)?

@shizhMSFT
Copy link
Contributor

Since it is non-trivial to define a generic metadata store, it would be better if Ratify can implement its metadata store for its specific scenarios.

Multi-process support is also another non-trivial work and it is not on the roadmap of oras-go as it requires an atomicity source such as databases, file systems with O_EXCL support, and etc.

Wwwsylvia pushed a commit that referenced this issue Jan 11, 2024
Part of #472
Signed-off-by: Xiaoxuan Wang <wangxiaoxuan119@gmail.com>
@shizhMSFT
Copy link
Contributor

As #653 and #656 are merged, the basic delete features have been done. Thank @wangxiaoxuan273 @eiffel-fl @carabasdaniel for your contributions to this effort. Since this issue request more features, which are out of the scope of v2.4.0, I will keep this issue open and move it to future milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

7 participants