New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/WIREUP: Increase UCP_MAX_LANES to 64 #9814
base: master
Are you sure you want to change the base?
UCP/WIREUP: Increase UCP_MAX_LANES to 64 #9814
Conversation
Would it make sense to have this parameter as a configure option? |
I think not, we should be able to find a solution to set upper limit to 64 w/o extra overheads when the actual number of lanes is small |
src/ucp/rndv/rndv_rkey_ptr.c
Outdated
|
||
ucs_assertv(rpriv->spriv.super.lane != UCP_NULL_LANE, "lane=%d", | ||
rpriv->spriv.super.lane); | ||
attr->lane_map = UCS_BIT(rpriv->spriv.super.lane); | ||
if (rpriv->ack.lane != UCP_NULL_LANE) { | ||
attr->lane_map |= UCS_BIT(rpriv->ack.lane); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is required since ack.lane
can be equal to 255 (UCP_NULL_LANE
) so after increasing lane_map
size to 64 it started to set 63rd bit in lane_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did it set 63d bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think (1ul << 256)
is anyway undefined behavior as 256 >= 64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did it set 63d bit?
To be honest, I don't know. I just detected this behavior and when I applied this change it gone. Do you want me to check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was experimenting on similar case and got unpredictable result, i guess this should be ok
test/gtest/common/test_obj_size.cc
Outdated
@@ -57,7 +57,7 @@ UCS_TEST_F(test_obj_size, size) { | |||
EXPECTED_SIZE(ucp_rkey_t, 32); | |||
#endif | |||
/* TODO reduce request size to 240 or less after removing old protocols state */ | |||
EXPECTED_SIZE(ucp_request_t, 264); | |||
EXPECTED_SIZE(ucp_request_t, 280); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can consider increasing number of lanes to 32 instead of 64. Or allowing creation of 64 lanes only for wireup and restrict the max lanes number by 32 or 16 for RNDV protocols. What sounds better for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both sounds ok. Maybe it is better to restrict protocol;s to use up to 16 lanes? Then no need to keep large maps in the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr #9862 should address it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request size was rolled back.
test/gtest/ucp/test_ucp_sockaddr.cc
Outdated
ASSERT_LE(16, (int)ucp_ep_num_lanes(sender().ep())); | ||
ASSERT_LE(16, (int)ucp_ep_num_lanes(receiver().ep())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have some platform which can create more than 16 lanes in that particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd suggest to change this test or wtite a new one which tests creation of 64 lanes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/gtest/ucp/test_ucp_tag_xfer.cc
Outdated
@@ -1247,19 +1247,19 @@ UCS_TEST_P(multi_rail_max, max_lanes, "IB_NUM_PATHS?=16", "TM_SW_RNDV=y", | |||
|
|||
ucp_lane_index_t num_lanes = ucp_ep_num_lanes(sender().ep()); | |||
ASSERT_EQ(ucp_ep_num_lanes(receiver().ep()), num_lanes); | |||
ASSERT_EQ(num_lanes, max_lanes); | |||
ASSERT_GE(num_lanes, max_lanes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have some platform which can create more than 16 lanes in that particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, should try creating 64 lanes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to revert now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -239,7 +239,7 @@ static inline ucp_rsc_index_t | |||
ucp_ep_config_get_dst_md_cmpt(const ucp_ep_config_key_t *key, | |||
ucp_md_index_t dst_md_index) | |||
{ | |||
unsigned idx = ucs_popcount(key->reachable_md_map & UCS_MASK(dst_md_index)); | |||
unsigned idx = ucs_bitmap2idx(key->reachable_md_map, dst_md_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a little :)
I just discovered that there is a helper function for that logic, so decided to included it to the patch.
src/ucp/rndv/rndv_rkey_ptr.c
Outdated
|
||
ucs_assertv(rpriv->spriv.super.lane != UCP_NULL_LANE, "lane=%d", | ||
rpriv->spriv.super.lane); | ||
attr->lane_map = UCS_BIT(rpriv->spriv.super.lane); | ||
if (rpriv->ack.lane != UCP_NULL_LANE) { | ||
attr->lane_map |= UCS_BIT(rpriv->ack.lane); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did it set 63d bit?
@@ -129,6 +130,7 @@ class mem_buffer { | |||
size_t length, size_t offset, | |||
const void *buffer, const void *orig_ptr) | |||
{ | |||
ucs_assertv(length <= 8, "length=%zu", length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since new implementation of UCS_MASK will still do undefined behavior if input value > 64. In that particular case length is size of one element that should be compared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can define UCS_MASK the same was UCS_MASK_SAFE is defined and then no need for this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to use safe implementation of mask for all the types' sizes we need to define separate macro for each type (like `UCS_MASK8, UCS_MASK16, UCS_MASK32, etc...) and each of that macros will be capped by the full mask for particular type size.
Otherwise, in case of one common UCS_MASK
macro there always would be some places when too big value can be passed to mask as input parameter and therefore undefined behavior would happen. UCS_MASK_SAFE
old definition can work only with 64-bits output values, since it issues compilation error for smaller ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make UCS_MASK always 64-bit, and if the result is casted to smaller type it will still be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make UCS_MASK always 64-bit, and if the result is casted to smaller type it will still be ok?
Can you show example of the compilation error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's example of compilation error on Ubuntu 20.04 for macro implementation with UINT64_MAX
:
/__w/2/s/contrib/../src/uct/ib/mlx5/ib_mlx5.c:218:26: error: implicit conversion from 'unsigned long' to 'unsigned int' changes value from 18446744073709551615 to 4294967295 [-Werror,-Wconstant-conversion]
cq->cq_length_mask = UCS_MASK(cq->cq_length_log);
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/2/s/contrib/../src/ucs/sys/compiler_def.h:83:38: note: expanded from macro 'UCS_MASK'
#define UCS_MASK(_i) (((_i) >= 64) ? 0xFFFFFFFFFFFFFFFF : (UCS_BIT(_i) - 1))
^~~~~~~~~~~~~~~~~~
I will try to implement it through inline function and see whether error would happen then.
But anyway, from my point of view we are walking on a very thin ice here, since even if it would work in our CI, it can break something on some tricky platform. So I would prefer to use separate implementations for UCS_MASK
and UCS_MASK64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would using #define UCS_MASK(_i) (((_i) >= 64) ? ~0 : (UCS_BIT(_i) - 1))
fix all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvegas1 thanks for the suggestion!
I checked C standard and that what does it say about conditional operator:
If both the second and third operands have arithmetic type, the result type that would be
determined by the usual arithmetic conversions, were they applied to those two operands,
is the type of the result. If both the operands have structure or union type, the result has
that type. If both operands have void type, the result has void type.
So if I understand that correctly, it means that ~0 will be casted to the type that is used as a left operand of assignment.
@@ -129,6 +130,7 @@ class mem_buffer { | |||
size_t length, size_t offset, | |||
const void *buffer, const void *orig_ptr) | |||
{ | |||
ucs_assertv(length <= 8, "length=%zu", length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can define UCS_MASK the same was UCS_MASK_SAFE is defined and then no need for this..
test/gtest/common/test_obj_size.cc
Outdated
@@ -57,7 +57,7 @@ UCS_TEST_F(test_obj_size, size) { | |||
EXPECTED_SIZE(ucp_rkey_t, 32); | |||
#endif | |||
/* TODO reduce request size to 240 or less after removing old protocols state */ | |||
EXPECTED_SIZE(ucp_request_t, 264); | |||
EXPECTED_SIZE(ucp_request_t, 280); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr #9862 should address it
test/gtest/ucp/test_ucp_tag_xfer.cc
Outdated
@@ -1247,19 +1247,19 @@ UCS_TEST_P(multi_rail_max, max_lanes, "IB_NUM_PATHS?=16", "TM_SW_RNDV=y", | |||
|
|||
ucp_lane_index_t num_lanes = ucp_ep_num_lanes(sender().ep()); | |||
ASSERT_EQ(ucp_ep_num_lanes(receiver().ep()), num_lanes); | |||
ASSERT_EQ(num_lanes, max_lanes); | |||
ASSERT_GE(num_lanes, max_lanes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, should try creating 64 lanes
src/ucp/rma/flush.c
Outdated
@@ -25,12 +25,12 @@ ucp_ep_flush_request_update_uct_comp(ucp_request_t *req, int diff, | |||
&req->send.state.uct_comp, req->send.state.uct_comp.count, | |||
diff); | |||
ucs_assertv(!(req->send.flush.started_lanes & new_started_lanes), | |||
"req=%p started_lanes=0x%zx new_started_lanes=0x%zx", req, | |||
req->send.flush.started_lanes, new_started_lanes); | |||
"req=%p started_lanes=0x%"PRIx64" new_started_lanes=0x%"PRIx64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls check clang format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -40,7 +40,7 @@ UCP_UINT_TYPE(UCP_MD_INDEX_BITS) ucp_md_map_t; | |||
|
|||
|
|||
/* Lanes */ | |||
#define UCP_MAX_LANES 16 | |||
#define UCP_MAX_LANES 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably change code that uses ucs_ffs32 on the lane map to use ucs_ffs64
for example in protov2 rndv/put/zcopy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed all the entries that I found, but I haven't found one in rndv/put/zcopy. Maybe they were removed in #9862
2346a5b
to
ffcca4a
Compare
test failure seems relevant (with ASAN):
|
@yosefe It seems related but I tried to reproduce it 100 times and test always ends successfully. I also reran the CI and it passed too. That's very strange, but I think it can be some configurational issue. How do you think, can we merge it now or should we try to reproduce it more? |
One weird thing i see here is we expect 64 lanes with protov1 test. Maybe it's wrong? |
We set |
i think we should limit protov1 tests to 16 lanes, since there may be places in protov1 flows we are not updating to support more lanes |
through am - Enable this test for protov2 - Set max_lanes to 16 for protov1 - Skip testing unsupported case - Relax requirement for splitting message above lanes - Enable am bw lanes creation to support multi-lane RNDV through AM
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
What
Increase number of maximal lanes per EP to 64.
Why ?
To allow collecting information about all lanes on systems with many transports/devices.