Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

Open
mfbalin opened this issue Apr 22, 2024 · 6 comments
Open

[GraphBolt] CPU RAM Feature Cache for DiskBasedFeature #7339

mfbalin opened this issue Apr 22, 2024 · 6 comments
Assignees
Labels
feature request Feature request

Comments

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 22, 2024

馃殌 Feature

When we use a DiskBasedFeatureStore, we will need to cache frequently accessed items in a CPU cache so that the disk read bandwidth requirements are reduced.

Motivation

Will improve performance immensely on large datasets whose data do not fit the CPU RAM.

Community

If anyone from the community is interested in helping out, we would appreciate it. We are going to write a GraphBolt paper and people who contribute significantly can become a co-author in that paper if their contribution happens before the GraphBolt paper is finalized and published.

@mfbalin mfbalin added the feature request Feature request label Apr 22, 2024
@mfbalin mfbalin changed the title [GraphBolt] CPU RAM Feature Cache for DiskBasedFeatureStore [GraphBolt] CPU RAM Feature Cache for DiskBasedFeature Apr 22, 2024
@Rhett-Ying
Copy link
Collaborator

@mfbalin
what is the difference between manually cache frequently accessed items with DiskBasedFeature and TorchBasedFeature with in_memory=False in which cache is automatically applied by OS?
Actually, this raise me the basic question: in what kind of scenario we prefer DiskBasedFeature than TorchBasedFeature with in_memory=False? What is the advantages of DiskBasedFeature?

@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 26, 2024

@Rhett-Ying io_uring is more efficient and faster compared to using mmap. With io_uring, you need fewer threads to saturate the SSD bandwidth. When it comes to caching, the OS caches pages usually in sizes 4KB, however, feature dimension * dtype_bytes is usually smaller than that. Thus when the OS caches a page, it will cache unnecessary vertex features along with it too. The cache will be less effective because of that.

@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 26, 2024

And I believe we can use a better caching strategy than the one used inside the Linux kernel. For example, see this paper on a state-of-the-art simple caching policy: https://dl.acm.org/doi/10.1145/3600006.3613147

@Rhett-Ying
Copy link
Collaborator

As the indices of feature data are random and scattered, it requires separate I/O request to be submitted to submission queue without any explicit optimization in application level. As for the cache, it also requires app-level optimization to make io_uring perform comparable to mmap which involves cache automatically.

With io_uring, you need fewer threads to saturate the SSD bandwidth.

Is it achieved by submit many I/O request to submission queue and wait for completion?

@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 28, 2024

As the indices of feature data are random and scattered, it requires separate I/O request to be submitted to submission queue without any explicit optimization in application level. As for the cache, it also requires app-level optimization to make io_uring perform comparable to mmap which involves cache automatically.

With io_uring, you need fewer threads to saturate the SSD bandwidth.

Is it achieved by submit many I/O request to submission queue and wait for completion?

Yes, that is how io_uring works, you batch your requests and submit them with a single linux system call. When we also have a cache, it will outperform mmap approach significantly.

@Rhett-Ying
Copy link
Collaborator

I am not sure if it's easy and clean to implement caching policy in app-level. The trade-off on performance improvement and code logic complexity needs to be taken into consideration.

@pyynb Please read the paper @mfbalin suggested for caching policy: https://dl.acm.org/doi/10.1145/3600006.3613147.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Feature request
Projects
None yet
Development

No branches or pull requests

3 participants