Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Image Acceleration(Apparate) #165

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

kofj
Copy link
Contributor

@kofj kofj commented May 24, 2021

WIP. Update later.

Signed-off-by: fanjiankong fanjiankong@tencent.com

Signed-off-by: fanjiankong <fanjiankong@tencent.com>
@kofj kofj changed the title [WIP] Proposal: Image Acceleration(Apparate) Proposal: Image Acceleration(Apparate) Jun 1, 2021
@tianon
Copy link
Member

tianon commented Jun 2, 2021

This proposal sounds conceptually very similar to https://github.com/containerd/stargz-snapshotter 🤔

@bergwolf
Copy link
Contributor

bergwolf commented Jun 3, 2021

And very even more similar to nydus image acceleration service: https://github.com/dragonflyoss/image-service

We've been discussing with Harbor team to create a pluggable image conversion mechanism that works for different image formats (currently nydus and estargz included). Maybe Apparate can join the force as well ;)

/cc @ktock

@xujihui1985
Copy link

what's the difference between https://github.com/dragonflyoss/image-service and this one?

@ktock
Copy link

ktock commented Jun 3, 2021

We've been discussing with Harbor team to create a pluggable image conversion mechanism that works for different image formats (currently nydus and estargz included). Maybe Apparate can join the force as well ;)

👍

Recently a variety of image formats are discussed in the community (e.g. nydus, estargz, zstd:chunked...) not only Apparate, so it would be great to have a generic (and pluggable) conversion mechanism that works for them.

@imeoer
Copy link

imeoer commented Jun 3, 2021

A pluggable image conversion mechanism has also been proposed here: #167
We can participate in the discussion together. :)

@ghost
Copy link

ghost commented Jun 3, 2021

It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter).
I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.

@ktock
Copy link

ktock commented Jun 4, 2021

@lovecontainers Standardization of lazy pulling in the current version of OCI Image Spec (v1) is discussed in opencontainers/image-spec#815.
nydus is proposed to the next version of OCI Image Spec (a.k.a. OCIv2). c.f. https://www.cncf.io/blog/2020/10/20/introducing-nydus-dragonfly-container-image-service/

@ghost
Copy link

ghost commented Jun 4, 2021

@ktock yeah, I hope for the next oci spec. but nydus looks quite similar to stargz as it illustrated in the doc that nydus is a improvement of stargz. In fact, almost all newer remote image formats looks the same. So I think maybe is a better way to bring up the stargz v2 rather than so many stargz liked ones. At this moment, widely disscussion is necessary, but repeated ones are meaningless.

@lihuiba
Copy link

lihuiba commented Jun 4, 2021

@ghost
Copy link

ghost commented Jun 4, 2021

@lihuiba thank u, this is my first time learned about overlaybd for I am a beginner of containers. it looks like traditional vm image and native friendly to remote access. The most interesting point for me is your implementation deos not depends on FUSE.

@jiangliu
Copy link

jiangliu commented Jun 4, 2021

It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter).
I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.

There's a fundamental difference between stargz and nydus:)
Nydus could be thought as a file system over object storage and has a split fs metadata/data design, so different images could share data blob objects.

@kofj
Copy link
Contributor Author

kofj commented Jun 4, 2021

@malc0lm Pls subscribe and discuss here, we need to answer questions from the community.

@ghost
Copy link

ghost commented Jun 4, 2021

It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter).
I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.

There's a fundamental differen between stargz and nydus:)
Nydus could be thought as a file system over object storage and has a split fs metadata/data design, so different images could share data blob objects.

it is really a great improvement. it is hard to say a fundamental difference, and also the Apparatus. these similar propsosals may have competitions for business, for they stand for different companies, but make no sense for community reaching an agreement of next oci .

@lihuiba
Copy link

lihuiba commented Jun 5, 2021

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

@xujihui1985
Copy link

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

interesting,good for you, you are so funny

@xujihui1985
Copy link

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.

@lihuiba
Copy link

lihuiba commented Jun 7, 2021

@xujihui1985 Hi, jihui. "It doesn't depends on FUSE / virtio-fs" is just a statement of fact, and a confirmation to lovecontainers. The reasons why I believe overlaybd is the best is complicated, and I suggest you read the papers above mentioned. There are paragraphs discussing this topic. Thanks!

@lihuiba
Copy link

lihuiba commented Jun 7, 2021

@xujihui1985 Higher abstraction level doesn't necessarily mean better solution. For example, Python is a higher-level language than Java or C/C++, but Python is not necessarily better in every aspect. The best (-fit) abstractions vary in difference scenarios. The abstraction of block device doesn't preclude a file system abstraction on top of it. Actually, we have made an internal solution that includes an enhanced file system, called rofs, atop overlaybd. This solution unleashes all the imaginations about the file system abstraction, while retaining the advantages of block device, i.g. simplicity and efficiency.

@xujihui1985
Copy link

@xujihui1985 Higher abstraction level doesn't necessarily mean better solution. For example, Python is a higher-level language than Java or C/C++, but Python is not necessarily better in every aspect. The best (-fit) abstractions vary in difference scenarios. The abstraction of block device doesn't preclude a file system abstraction on top of it. Actually, we have made an internal solution that includes an enhanced file system, called rofs, atop overlaybd. This solution unleashes all the imaginations about the file system abstraction, while retaining the advantages of block device, i.g. simplicity and efficiency.

@lihuiba I don't get this metaphor, what's the matter with python? 😂 and I'm pleased to know you are working on a solution of filesystem. welcome to join the force. :)

@ghost
Copy link

ghost commented Jun 7, 2021

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.

@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?

@lihuiba
Copy link

lihuiba commented Jun 7, 2021

@lovecontainers My team is also trying to improve fuse's performance, and we have an up-coming paper on this topic: https://www.usenix.org/conference/atc21/presentation/hsu .

But there's one more thing to solve: failure recovery. If fuse server process crashes, or gets killed, the file system instance may not recovery.

These problems (perforamce, fault-tolerance, etc.) do not exist in overlaybd.

@jiangliu
Copy link

jiangliu commented Jun 7, 2021

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.

@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?

At early stage of developing fs based image acceleration technologies, FUSE is a good choice. When the technology becomes mature, an in kernel read only fs may be better solution.
And nydus aims to become an in kernel fs:)

@xujihui1985
Copy link

xujihui1985 commented Jun 7, 2021

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.

@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?

@lovecontainers FUSE is not the problem of bottleneck, the problem is how to use fuse, the pros of stargz is the compatibility with targz, this is realy good one, the problem IMO is

  1. each layers of stargz image will mount as a fuse mountpoint, and these layers then combine to overlayfs.
  2. the toc must be fully load into rss for index inode, even the inode may never been read, that cause high memory footprint.

What nydus does to improve is to do "overlay" in build stage, and build the final view of root fs in metadata, so that one fuse mountpoint per image, underlying blob file is shared.
Instead of loading entire toc index into rss memory, nydus build a inode table in the header of metadata, so only a small portion of memory is needed during the startup, you can refer to the detailed design doc here https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md

@ghost
Copy link

ghost commented Jun 7, 2021

@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.

obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.

@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?

@lovecontainers FUSE is not the problem of bottleneck, the problem is how to use fuse, the pros of stargz is the compatibility with targz, this is realy good one, the problem IMO is

  1. each layers of stargz image will mount as a fuse mountpoint, and these layers then combine to overlayfs.
  2. the toc must be fully load into rss for index inode, even the inode may never been read, that cause high memory footprint.

What nydus does to improve is to do "overlay" in build stage, and build the final view of root fs in metadata, so that one fuse mountpoint per image, underlying blob file is shared.
Instead of loading entire toc index into rss memory, nydus build a inode table in the header of metadata, so only a small portion of memory is needed during the startup, you can refer to the detailed design doc here https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md

yeah, I have already tried something similar to your solutions. thank you.

@ghost
Copy link

ghost commented Jun 7, 2021

@kofj where is the git repository of Apparate? I am curious about Apparate's solution of recovering fuse process :)

@kofj
Copy link
Contributor Author

kofj commented Jun 7, 2021

An important goal of this proposal is to create a vendor-neutral sub-project in the goharbor community.

@malc0lm
Copy link

malc0lm commented Jun 18, 2021

@lovecontainers Sorry, there is no Apparate repository in github currently. Recovering fuse process is core ability for Apparate. First, fuse in userspace and kernel fuse module use /dev/fuse fd to communitcate, so it must separate fuse process and holding fd process. And we also need fuse request tracing in case of io hang in recovering. Finally, in read/write fuse filesystem, we also need record opened fd.

@ghost
Copy link

ghost commented Jun 21, 2021

@lovecontainers Sorry, there is no Apparate repository in github currently. Recovering fuse process is core ability for Apparate. First, fuse in userspace and kernel fuse module use /dev/fuse fd to communitcate, so it must separate fuse process and holding fd process. And we also need fuse request tracing in case of io hang in recovering. Finally, in read/write fuse filesystem, we also need record opened fd.

looking forward to see your implementation on github

@OrlinVasilev
Copy link
Member

@OrlinVasilev
Copy link
Member

@jiangliu
Copy link

jiangliu commented May 27, 2022

It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter).
I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.

There's a fundamental difference between stargz and nydus:) Nydus could be thought as a file system over object storage and has a split fs metadata/data design, so different images could share data blob objects.

A status update about the nydus image service project(https://github.com/dragonflyoss/image-service). Recently we have published nydus v2.0, which includes an experiment rafsv6 image format. The rafsv6 image format is compatible with in the kernel EROFS filesystem, so a rafsv6 image could be directly mounted by the EROFS. And a patchset to integrate EROFS with fscache subsystem has been merged into linux 5.19-rc1.

With all this, an rafsv6 image could be used in two ways:

  1. fuse + nydusd, for backward compatibility and extensible
  2. erofs + fscache + nydusd, for high performance and reliability.

We are preparing articles to give more information about this topic too.

Thanks!

@jiangliu
Copy link

FYI:with the latest linux 5.19-rc1, nydus image could be mounted by the in-kernel EROFS:) https://d7y.io/blog/2022/06/06/evolution-of-nydus/

@OrlinVasilev
Copy link
Member

@kofj can you check https://github.com/goharbor/acceleration-service and take decision to close this or rework it! Thank you!

@OrlinVasilev
Copy link
Member

#167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet