Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmix: remove the MCA framework #12309

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggouaillardet
Copy link
Contributor

pmix is now a first class citizen and hence do not need to be part of a MCA framework anymore

Refs. #12282

@ggouaillardet
Copy link
Contributor Author

I am not fully happy with this PR.
Since there is no more pmix framework, the pmix MCA parameters do not appear any more in ompi_info.

@jsquyres @bosilca could you please advise on how to move forward?

@rhc54
Copy link
Contributor

rhc54 commented Feb 5, 2024

Which params are you concerned about? I think there were a couple of params in the pmix/base - are those the ones?

If so, what I did in PRRTE was to add a param registration function in your opal/pmix/pmix.c and then just add that to the ompi_rte function. You can also add it to ompi_info so they display.

If you mean the PMIx internal params, then that's been an ongoing discussion. I forget the OMPI issue number, but basically we had agreed on a method but nobody had time to implement it.

@bosilca
Copy link
Member

bosilca commented Feb 5, 2024

The pmix MCA param would not make sense, and you already moved pmix_base_async_modex, pmix_base_collect_data and pmix_base_exchange_timeout. While leaves a single, somewhat important MCA param out, the pmix_base_verbose. But as we already have opal_pmix_verbose_output this shall not an issue.

@ggouaillardet
Copy link
Contributor Author

After this PR, the following bits have disappeared from ompi_info --all

My concern is if we cannot fix that, that could be a deal breaker for v5.
I will investigate Ralph's suggestion.

            MCA pmix base: ---------------------------------------------------
            MCA pmix base: parameter "pmix" (current value: "", data source: default, level: 2 user/detail, type: string)
                           Default selection set of components for the pmix framework (<none> means use all components that can be found)
            MCA pmix base: ---------------------------------------------------
            MCA pmix base: parameter "pmix_base_verbose" (current value: "error", data source: default, level: 8 dev/detail, type: int)
                           Verbosity level for the pmix framework (default: 0)
                           Valid values: -1|none, 0|error, 10|component, 20|warn, 40|info, 60|trace, 80|debug, 100|max, 0-100
            MCA pmix base: parameter "pmix_base_async_modex" (current value: "false", data source: default, level: 9 dev/all, type: bool)
                           Use asynchronous modex mode
                           Valid values: 0|f|false|disabled|no|n, 1|t|true|enabled|yes|y
            MCA pmix base: parameter "pmix_base_collect_data" (current value: "true", data source: default, level: 9 dev/all, type: bool)
                           Collect all data during modex
                           Valid values: 0|f|false|disabled|no|n, 1|t|true|enabled|yes|y
            MCA pmix base: parameter "pmix_base_exchange_timeout" (current value: "-1", data source: default, level: 3 user/all, type: int)
                           Time (in seconds) to wait for a data exchange to complete

pmix is now a first class citizen and hence do not need
to be part of a MCA framework anymore

Refs. open-mpi#12282

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@rhc54
Copy link
Contributor

rhc54 commented Feb 8, 2024

@ggouaillardet Would you like me to fix the param issue for you? Having done it before, it should only take a few minutes to port it here.

@ggouaillardet
Copy link
Contributor Author

I came up with this (plus a missing trivial fix)

diff --git a/opal/mca/common/ofi/common_ofi.c b/opal/mca/common/ofi/common_ofi.c
index d39412f..ed63c95 100644
--- a/opal/mca/common/ofi/common_ofi.c
+++ b/opal/mca/common/ofi/common_ofi.c
@@ -34,7 +34,6 @@
 #include "opal/mca/base/mca_base_var.h"
 #include "opal/mca/hwloc/base/base.h"
 #include "opal/mca/memory/base/base.h"
-#include "opal/mca/pmix/base/base.h"
 #include "opal/util/argv.h"
 #include "opal/util/show_help.h"
 
diff --git a/opal/runtime/opal_info_support.c b/opal/runtime/opal_info_support.c
index e167451..437163f 100644
--- a/opal/runtime/opal_info_support.c
+++ b/opal/runtime/opal_info_support.c
@@ -340,6 +340,7 @@ void opal_info_register_types(opal_pointer_array_t *mca_types)
     for (i = 0; NULL != opal_frameworks[i]; i++) {
         opal_pointer_array_add(mca_types, opal_frameworks[i]->framework_name);
     }
+    opal_pointer_array_add(mca_types, "pmix");
 }
 
 int opal_info_register_framework_params(opal_pointer_array_t *component_map)

If there is a better way, feel free to suggest or push in this PR directly.

I still believe we should find a way to keep pmix_base_verbose param, at least for the v5 series.

@ggouaillardet
Copy link
Contributor Author

Looks like this does the trick

diff --git a/opal/mca/common/ofi/common_ofi.c b/opal/mca/common/ofi/common_ofi.c
index d39412f..ed63c95 100644
--- a/opal/mca/common/ofi/common_ofi.c
+++ b/opal/mca/common/ofi/common_ofi.c
@@ -34,7 +34,6 @@
 #include "opal/mca/base/mca_base_var.h"
 #include "opal/mca/hwloc/base/base.h"
 #include "opal/mca/memory/base/base.h"
-#include "opal/mca/pmix/base/base.h"
 #include "opal/util/argv.h"
 #include "opal/util/show_help.h"
 
diff --git a/opal/pmix/pmix-internal.h b/opal/pmix/pmix-internal.h
index 1a8eefe..46d6d18 100644
--- a/opal/pmix/pmix-internal.h
+++ b/opal/pmix/pmix-internal.h
@@ -66,6 +66,7 @@ typedef struct {
 OPAL_DECLSPEC extern bool opal_pmix_collect_all_data;
 OPAL_DECLSPEC extern bool opal_pmix_base_async_modex;
 OPAL_DECLSPEC extern int opal_pmix_verbose_output;
+OPAL_DECLSPEC extern int opal_pmix_verbose;
 
 /* define a caddy for pointing to pmix_info_t that
  * are to be included in an answer */
diff --git a/opal/pmix/pmix.c b/opal/pmix/pmix.c
index 34900ca..c29fdc6 100644
--- a/opal/pmix/pmix.c
+++ b/opal/pmix/pmix.c
@@ -714,6 +714,7 @@ OBJ_CLASS_INSTANCE(opal_proclist_t, opal_list_item_t, NULL, NULL);
 
 bool opal_pmix_collect_all_data = true;
 int opal_pmix_verbose_output = -1;
+int opal_pmix_verbose = 0;
 bool opal_pmix_base_async_modex = false;
 opal_pmix_base_t opal_pmix_base = {.evbase = NULL,
                                    .timeout = 0,
diff --git a/opal/runtime/opal_info_support.c b/opal/runtime/opal_info_support.c
index e167451..437163f 100644
--- a/opal/runtime/opal_info_support.c
+++ b/opal/runtime/opal_info_support.c
@@ -340,6 +340,7 @@ void opal_info_register_types(opal_pointer_array_t *mca_types)
     for (i = 0; NULL != opal_frameworks[i]; i++) {
         opal_pointer_array_add(mca_types, opal_frameworks[i]->framework_name);
     }
+    opal_pointer_array_add(mca_types, "pmix");
 }
 
 int opal_info_register_framework_params(opal_pointer_array_t *component_map)
diff --git a/opal/runtime/opal_params.c b/opal/runtime/opal_params.c
index 5a9e7e2..ac05809 100644
--- a/opal/runtime/opal_params.c
+++ b/opal/runtime/opal_params.c
@@ -109,6 +109,18 @@ int opal_register_params(void)
                                  MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, OPAL_INFO_LVL_3,
                                  MCA_BASE_VAR_SCOPE_READONLY, &opal_pmix_base.timeout);
 
+    opal_pmix_verbose = 0;
+    (void) mca_base_var_register("opal", "pmix", "base", "verbose",
+                                 "Verbosity level for the pmix framework (default: 0)",
+                                 MCA_BASE_VAR_TYPE_INT,
+                                 &mca_base_var_enum_verbose, 0,
+                                 MCA_BASE_VAR_FLAG_SETTABLE, OPAL_INFO_LVL_8,
+                                 MCA_BASE_VAR_SCOPE_LOCAL,
+                                 &opal_pmix_verbose);
+    if (0 < opal_pmix_verbose) {
+        opal_pmix_verbose_output = opal_output_open(NULL);
+    }
+
     opal_finalize_register_cleanup(opal_deregister_params);
 
     return OPAL_SUCCESS;

@rhc54
Copy link
Contributor

rhc54 commented Feb 9, 2024

Yep, that's all there is to it for the params. However, I don't think this patch is entirely correct. You don't want to declare this still to be a framework by calling MCA_BASE_FRAMEWORK_DECLARE. I just added a simple pmix_base_init function and have opal_init call it to setup the "base" object and register the params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants