Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

09 of 10 LNX series - Add the peer infrastructure to the CXI provider #10033

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

amirshehataornl
Copy link
Contributor

Add CXI updates to support the peer infrastructure in preparation for using it with the new LINKx provider.

Signed-off-by: Amir Shehata shehataa@ornl.gov

@amirshehataornl amirshehataornl force-pushed the 08_lnx_cxi_updates branch 2 times, most recently from 239fb62 to 6d97ee5 Compare May 10, 2024 00:33
When checking fabric attributes with ofi_check_fabric_attr() make sure to
consider provider exclusion.

When checking to see if a provider name is given, only consider ones which
are not excluded using the '^' character.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
It is not efficient to do a reverse lookup on the AV table when a message
is received. Some providers do not store the fi_addr_t associated with the
peer in the header passed on the wire. And it is not practical to require
providers to add that to wire header, as it would break backwards
compatibility.

In order to handle this case, an address matching callback is added to the
peer_srx.peer_ops structure. This allows the provider receiving the
message to register an address matching callback. This callback is called
by the owner provider to match an fi_addr_t with provider specific address
in the message received.

The callback allows the receiving provider to do an O(1) index into the AV
table to lookup the address of the peer, and then compare that with the
source address in the received message.

As part of this change provider specific address information needs to be
passed to the owner provider, which the owner will need to give back to the
receiving provider, when it attempts to do address matching.

Update the SHM and LINKx providers to conform with the API changes

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add a new structure fi_peer_match to collect the parameters which need
to be passed to the get_msg and get_tag functions.

Update the util_get_tag() and util_get_msg() function callbacks.
Compilation gives a warning but not failing. This causes memory
corruption when the callbacks are called.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add a memory registration callback to the fi_ops_srx_peer. This allows
core providers to expose a memory registration callback which the parent
or peer provider can use to register memory on the receive path.

For example the CXI provider registers memory with the NIC on the receive
path. When using the peer infrastructure this can not happen because we
do not know which provider will perform the receive operation. But if
the source NID is specified then we can know and therefore we can
perform the receive buffer registration at the top of the receive path.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add FI_PEER capability bit

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
The parent provider should be able to get access to the peer provider
callbacks. Added the srx block in the fid.context so we can retrieve it
later on.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add the FI_PEER capability bit to the SHM fi_infos

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add the FI_PEER capability bit to the CXI provider fi_info

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
On cq_open, check the FI_PEER_IMPORT, if set, set all internal cq operation
to be enosys, with the exception to the read callback.

The read callback is overloaded to operate as a progress callback
function. Invoking the read callback will progress the enpoints linked to
this CQ.

Keep track of the fid_peer_cq structure passed in.

If the FI_PEER_IMPORT flag is set, then set the callbacks in cxip_cq structure
which handle writing to the peer_cq, otherwise set them to the ones which
write to the util_cq.

A provider needs to call a different set of functions to insert
completion events into an imported CQ vs an internal CQ.

These set of callback definition standardize a way to assign a different
function to a CQ object, which can then be called to insert into the CQ.

For example:

	struct prov_cq {
		struct util_cq *util_cq;
		struct fid_peer_cq *peer_cq;
		ofi_peer_cq_cb cq_cb;
	};

When a provider opens a CQ it can:

	if (attr->flags & FI_PEER_IMPORT) {
		prov_cq->cq_cb.cq_comp = prov_peer_cq_comp;
	} else {
		prov_cq->cq_cb.cq_comp = prov_cq_comp;
	}

Collect the peer CQ callbacks in one structure for use in CXI.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Restructure the code to allow for posting on the owner provider's shared
receive queues.

Do not do a reverse lookup on the AV table to get the fi_addr_t, instead
register an address matching callback with the owner. The owner can then
call the address matching callback to match an fi_addr_t to the source
address in the message received.

This is more efficient as the peer lookup can be an O(1) operation;
AV[fi_addr_t]. The peer's CXI address can be compared with the CXI address
in the message received.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Upstream has a different method of registering SRX. There is a limitation
where the SRX is only returned back in the upcall in get_tag/get_msg. But
that prevents the parent provider of doing anything else with the
peer callbacks. This presents a problem because we added a callback to
register memory on the receive path.

This patch updates the CXI provider

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add memory registration callback to allow for parent provider, if one
exists, to register receive buffers and not to wait until the data
arrives before we can register the receive buffers.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
@amirshehataornl amirshehataornl changed the title 08 lnx cxi updates 08 of 09 LNX series - Add the peer infrastructure to the CXI provider May 16, 2024
@amirshehataornl amirshehataornl changed the title 08 of 09 LNX series - Add the peer infrastructure to the CXI provider 09 of 10 LNX series - Add the peer infrastructure to the CXI provider May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant