Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2092: Runtime optimizations stage 3 #2093

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

lifflander
Copy link
Collaborator

Fixes #2092

@lifflander lifflander linked an issue Feb 14, 2023 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Feb 14, 2023

Pipelines results

PR tests (gcc-12, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-9, ubuntu, mpich, zoltan, json schema test)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for f823b5c (2023-04-26 18:49:37 UTC)

FAILED: examples/callback/CMakeFiles/callback.dir/Unity/unity_0_cxx.cxx.o 
/usr/bin/ccache /nvcc_wrapper/build/nvcc_wrapper -DJSON_USE_IMPLICIT_CONVERSIONS=1 -DVT_NO_COLOR_ENABLED -I/vt/lib/CLI -I/vt/lib/json/include -I/vt/lib/brotli/c/include -I/build/vt/release -I/vt/src -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /build/checkpoint/install/include -Wno-deprecated-gpu-targets -O3 -DNDEBUG -fdiagnostics-color=always -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -ftemplate-backtrace-limit=100 -Werror -std=c++17 -MD -MT examples/callback/CMakeFiles/callback.dir/Unity/unity_0_cxx.cxx.o -MF examples/callback/CMakeFiles/callback.dir/Unity/unity_0_cxx.cxx.o.d -o examples/callback/CMakeFiles/callback.dir/Unity/unity_0_cxx.cxx.o -c /build/vt/examples/callback/CMakeFiles/callback.dir/Unity/unity_0_cxx.cxx
/vt/src/vt/context/runnable_context/lb_data.impl.h(60): error: class "vt::messaging::ActiveMsg<vt::Envelope>" has no member "lbLiteInstrument"
          detected during:
            instantiation of "vt::ctx::LBData::LBData(ElmT *, MsgT *) [with ElmT=vt::vrt::collection::Indexable<vt::Index1D>, MsgT=vt::ShortMessage]" 
/vt/src/vt/runnable/runnable.impl.h(102): here
            instantiation of "void vt::runnable::RunnableNew::addContextLB(Args &&...) [with Args=<vt::vrt::collection::Indexable<vt::Index1D> *&, vt::ShortMessage *>]" 
/vt/src/vt/runnable/make_runnable.h(221): here
            instantiation of "vt::runnable::RunnableMaker<MsgT> &&vt::runnable::RunnableMaker<MsgT>::withLBData(ElmT *) [with MsgT=vt::vrt::collection::balance::CollectStatsMsg<MyCol>, ElmT=vt::vrt::collection::Indexable<vt::Index1D>]" 
/vt/src/vt/vrt/collection/manager.impl.h(229): here
            instantiation of "void vt::vrt::collection::CollectionManager::collectionAutoMsgDeliver<ColT,IndexT,MsgT>(MsgT *, vt::vrt::collection::Indexable<IndexT> *, vt::HandlerType, vt::NodeType, vt::trace::TraceEventIDType, __nv_bool) [with ColT=MyCol, IndexT=vt::Index1D, MsgT=vt::vrt::collection::balance::CollectStatsMsg<MyCol>]" 
/vt/src/vt/vrt/collection/manager.impl.h(551): here
            instantiation of "void vt::vrt::collection::CollectionManager::invokeMsgImpl(const vt::vrt::VirtualElmProxyType<ColT, ColT::IndexType> &, vt::MsgSharedPtr<MsgT>, __nv_bool) [with ColT=MyCol, MsgT=vt::vrt::collection::balance::CollectStatsMsg<MyCol>]" 
/vt/src/vt/vrt/collection/manager.impl.h(392): here
            [ 3 instantiation contexts not shown ]
            instantiation of "void vt::vrt::collection::CollectionManager::makeCollectionImpl(vt::vrt::collection::param::ConstructParams<ColT> &) [with ColT=MyCol]" 
/vt/src/vt/vrt/collection/collection_builder.impl.h(89): here
            instantiation of "void vt::vrt::collection::CollectionManager::makeCollectionHandler(vt::vrt::collection::param::ConstructParamMsg<ColT> *) [with ColT=MyCol]" 
/vt/%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (intel icpc, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-11, ubuntu, mpich, trace runtime, coverage)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.2, ubuntu, mpich)

Build for fed8a98 (2023-05-08 20:01:18 UTC)



The following tests FAILED:
    5 - vt_example:callback_2 (Failed)
    6 - vt_example:callback_4 (Failed)
   17 - vt_example:jacobi2d_vt_2 (Failed)
   18 - vt_example:jacobi2d_vt_4 (Failed)
  349 - vt:*TestLocationRoute/*test_route_entity_proc_1 (Failed)
  358 - vt:*TestLocationRoute/*test_route_entity_proc_2 (Timeout)
  359 - vt:*TestLocationRoute/*test_entity_cache_hits_proc_2 (Timeout)
  367 - vt:*TestLocationRoute/*test_route_entity_proc_4 (Timeout)
  368 - vt:*TestLocationRoute/*test_entity_cache_hits_proc_4 (Timeout)
  369 - vt:*TestLocationRoute/*test_entity_cache_migrated_entity_proc_4 (Timeout)
  634 - vt:*/TestTermDepSendChain.test_term_dep_send_chain/*_proc_2 (Failed)

Build log


PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-12, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-9, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-11, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-14, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-10, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.2, gcc-9, ubuntu, mpich)

Build for 85f8124 (2023-06-13 15:16:40 UTC)

Compilation - successful

Testing - passed

Build log


@lifflander lifflander force-pushed the 2092-runtime-optimizations-stage-2 branch from dd1c401 to 222c937 Compare February 14, 2023 23:30
@lifflander lifflander changed the base branch from develop to 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb February 14, 2023 23:31
@lifflander lifflander changed the base branch from 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb to develop February 15, 2023 18:28
@lifflander lifflander changed the title #2092: location: call function pointer directly #2092: Runtime optimizations stage 3 Feb 16, 2023
@PhilMiller
Copy link
Member

Are the first two commits clearing up a first/second handler kind of situation? Why shouldn't a runnable be made and called there?

@PhilMiller
Copy link
Member

More deeply, do we need to maintain some notion of 'system mode' runnables that do all of the relevant instrumentation for occasions when we want to trace and account for low-level activity?

@lifflander
Copy link
Collaborator Author

More deeply, do we need to maintain some notion of 'system mode' runnables that do all of the relevant instrumentation for occasions when we want to trace and account for low-level activity?

This is for the particular case where the collection message send is completely local. It goes through the location manager and then needs to call back into the collection manager. Creating a runnable in this case isn't particularly useful because the context it preserved on the stack (the runnable should not be enqueued). Once the collection's runnable is created, it preserves the context.

If the send is off node, the location manager will create runnables for its handlers to forward, etc.

Do you think this case is fine to optimize out?

@@ -963,7 +963,8 @@ void ActiveMessenger::prepareActiveMsgToRun(
if (is_obj) {
objgroup::dispatchObjGroup(base, handler, from_node, cont);
} else {
runnable::makeRunnable(base, not is_term, handler, from_node)
auto m = base;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is base declared as const& here, necessitating the additional reference count bump and then move of the copy? Can ownership transfer flow further up the stack?

Copy link
Collaborator

@nlslatt nlslatt Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifflander @stmcgovern Can one of you please respond to Phil's comment about this?

#if vt_check_enabled(trace_enabled)
auto const han_type = HandlerManager::getHandlerRegistryType(handler);
if (han_type == auto_registry::RegistryTypeEnum::RegVrt or
han_type == auto_registry::RegistryTypeEnum::RegGeneral or
han_type == auto_registry::RegistryTypeEnum::RegObjGroup) {
r->addContextTrace(msg, handler, from);
r->addContextTrace(r->getMsg(), handler, from);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If r has the message pointer as a member now, then this method can just refer to it internally, can't it? I feel like we may have discussed that at some point in the past, and there was a reason not to back then.

Copy link
Collaborator

@nlslatt nlslatt Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifflander @stmcgovern Can one of you please respond to Phil's comment about this?

@PhilMiller
Copy link
Member

PhilMiller commented Feb 16, 2023

Please squash the 'fix' commits into the commit they're fixing, for atomicity of evolution and review.

@PhilMiller
Copy link
Member

PhilMiller commented Feb 16, 2023

This generally looks good to me, even if only part of the code base is transformed to avoid copies and captures, and there's more to be done later.

@PhilMiller
Copy link
Member

More deeply, do we need to maintain some notion of 'system mode' runnables that do all of the relevant instrumentation for occasions when we want to trace and account for low-level activity?

This is for the particular case where the collection message send is completely local. It goes through the location manager and then needs to call back into the collection manager. Creating a runnable in this case isn't particularly useful because the context it preserved on the stack (the runnable should not be enqueued). Once the collection's runnable is created, it preserves the context.

If the send is off node, the location manager will create runnables for its handlers to forward, etc.

Do you think this case is fine to optimize out?

That sounds good then. We don't need to separately account those two bits different bits of overhead, since we already have a clear context in which it's occurring.

@stmcgovern stmcgovern marked this pull request as ready for review April 26, 2023 18:51
@stmcgovern stmcgovern force-pushed the 2092-runtime-optimizations-stage-2 branch from db26edd to f823b5c Compare April 26, 2023 18:51
@stmcgovern
Copy link
Contributor

Not ready yet, but took draft off to get more testing.

@stmcgovern stmcgovern force-pushed the 2092-runtime-optimizations-stage-2 branch from 1ee5caa to fed8a98 Compare May 8, 2023 20:01
@stmcgovern stmcgovern force-pushed the 2092-runtime-optimizations-stage-2 branch 2 times, most recently from a3cd2b8 to ef593be Compare May 16, 2023 15:04
@stmcgovern stmcgovern force-pushed the 2092-runtime-optimizations-stage-2 branch from ef593be to 5cb0060 Compare May 30, 2023 23:45
Copy link
Collaborator

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good but there are some open questions.

@@ -963,7 +963,8 @@ void ActiveMessenger::prepareActiveMsgToRun(
if (is_obj) {
objgroup::dispatchObjGroup(base, handler, from_node, cont);
} else {
runnable::makeRunnable(base, not is_term, handler, from_node)
auto m = base;
Copy link
Collaborator

@nlslatt nlslatt Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifflander @stmcgovern Can one of you please respond to Phil's comment about this?

#if vt_check_enabled(trace_enabled)
auto const han_type = HandlerManager::getHandlerRegistryType(handler);
if (han_type == auto_registry::RegistryTypeEnum::RegVrt or
han_type == auto_registry::RegistryTypeEnum::RegGeneral or
han_type == auto_registry::RegistryTypeEnum::RegObjGroup) {
r->addContextTrace(msg, handler, from);
r->addContextTrace(r->getMsg(), handler, from);
Copy link
Collaborator

@nlslatt nlslatt Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifflander @stmcgovern Can one of you please respond to Phil's comment about this?

Comment on lines 430 to 441
auto base_msg = user_msg.template to<BaseMsgType>();
return messaging::PendingSend(base_msg, [=](MsgPtr<BaseMsgType> in) {
return messaging::PendingSend(std::move(base_msg), [=](MsgPtr<BaseMsgType>&& in) mutable {
bool const is_obj = HandlerManager::isHandlerObjGroup(typed_handler);
if (is_obj) {
objgroup::dispatchObjGroup(
user_msg.template to<BaseMsgType>(), typed_handler, node, nullptr
);
} else {
runnable::makeRunnable(user_msg, true, typed_handler, node)
runnable::makeRunnable(std::move(user_msg), true, typed_handler, node)
.withTDEpochFromMsg()
.enqueue();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this lambda keep using user_msg (including to cast it to BaseMsgType again) instead of using in?

@@ -727,15 +730,17 @@ void EntityLocationCoord<EntityID>::routePreparedMsg(

if (use_eager) {
theMsg()->pushEpoch(epoch);
routeMsgEager<MessageT>(msg->getEntity(), msg->getHomeNode(), msg);
routeMsgEager<MessageT>(msg->getEntity(), msg->getHomeNode(), std::move(msg));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to extract the entity and home node before this call in case the move is evaluated first?

theMsg()->pushEpoch(epoch);
routeMsgNode<MessageT>(
msg->getEntity(), msg->getHomeNode(), node, msg
m->getEntity(), m->getHomeNode(), node, std::move(m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to extract the entity and home node before this call in case the move is evaluated first?

@stmcgovern stmcgovern force-pushed the 2092-runtime-optimizations-stage-2 branch from 82a360e to 85f8124 Compare June 13, 2023 15:17
@stmcgovern
Copy link
Contributor

This demonstrates performance improvement in the following performance test

From Develop @ 6109deb

Test results for test_collection_local_send running on 1 nodes:
[0] Results for test_collection_local_send (avg: 661.343ms stdev: 4.613ms min: 655.371ms max: 668.093ms)

----- TIMERS -----
[0] Results for colSend 1000000 (avg: 661.300ms stdev: 4.615ms min: 655.294ms max: 668.051ms)

From this branch 2092

Test results for test_collection_local_send running on 1 nodes:
[0] Results for test_collection_local_send (avg: 436.370ms stdev: 3.647ms min: 431.374ms max: 446.114ms)

----- TIMERS -----
[0] Results for colSend 1000000 (avg: 436.331ms stdev: 3.648ms min: 431.345ms max: 446.079ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runtime optimizations stage 3
4 participants