Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove remnants of libfabric parcelport #6474

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Apr 17, 2024

No description provided.

Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.02% 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (eb2bf57) 217975 185524 85.11%
Head commit (a32228d) 190892 (-27083) 162508 (-23016) 85.13% (+0.02%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6474) 1 1 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

@biddisco
Copy link
Contributor

Before you do away with the libfabric parcelport, be aware that I have a branch with a lot of improvements, that I had hoped to submit a PR for. We had an intern here last year and he workde on anew allocator, but unfortunately the allocator didn't quite fulfill all of our requirements, so I didn't yet proceed on integrating it with the parcelport.

The parcelport needs to allocate pinned memory and cache it, the allocator provides thiscapability by creating a custom mimalloc arena (ie, requires mimalloc), however the arena cannot be resized after creation, so it isn't ideal (though it would work). Thus a pre-allocated slabo f memory can be requested on start of the allocator.

Apart from this allocator issue, the parcelport is actually in good shape, though about 1 year out of sync with hpx main and I am aware that a lot of other parcelport related changes have gone in during that time - so breakage is almost inevitable. Perhaps one of the other parcelports has addressed the memory pinning issue and could provide a drop in replacement?

Note that the parcelport also has some quite well optimized polling and dispatching routines, though not as nice as the sender polling for mpi/cuda that is now in pika, which also handles transfer of continuations to the "right place" so that the custom polling pool remains uncluttered with extraneous work.

Given that the mimallloc allocator isn't perfect - would a PR to improve the libfabric in hpx (tested on many machines including amazon AWS etc) be of interest still - or do all the other parcelports now provide good performance and libfabric is no longer needed.

@hkaiser
Copy link
Member Author

hkaiser commented Apr 22, 2024

@biddisco I'd be more than happy to accept a PR that makes the current libfabric parcelport usable.

@biddisco
Copy link
Contributor

@biddisco I'd be more than happy to accept a PR that makes the current libfabric parcelport usable.

Do any of the other parcelports have code that manages memory pinning etc? Has anyone else worked on this recently?

@hkaiser
Copy link
Member Author

hkaiser commented Apr 22, 2024

@biddisco I'd be more than happy to accept a PR that makes the current libfabric parcelport usable.

Do any of the other parcelports have code that manages memory pinning etc? Has anyone else worked on this recently?

IIUC, @JiakunYan pins memory in the LCI parcelport, but that could be part of LCI proper, not the parcelport.

@JiakunYan
Copy link
Contributor

JiakunYan commented Apr 22, 2024

Yes, the memory registration caching code is implemented in LCI. However, it is disabled by default when LCI uses the libfabric backend. I believe libfabric has its own memory registration cache, so I just register and deregister memory buffers plainly. I haven't encountered performance issues related to memory registration.

@hkaiser hkaiser modified the milestones: 1.10.0, 1.11.0 May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants