Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error using shmem_reduce in example/oshmem_max_reduction.c #12419

Open
smguzik opened this issue Mar 20, 2024 · 8 comments
Open

Internal error using shmem_reduce in example/oshmem_max_reduction.c #12419

smguzik opened this issue Mar 20, 2024 · 8 comments

Comments

@smguzik
Copy link

smguzik commented Mar 20, 2024

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source tarball using:
Configure command line: '--build=x86_64-linux-gnu'
'--prefix=/usr/local/openmpi/5.0.2_gcc-12.2.0'
'--with-ucx' '--with-pmix=internal'
'--with-libevent=external' '--with-hwloc=external'
'--enable-mpi-fortran=all'
'--with-cuda=/usr/local/cuda'
'--with-cuda-libdir=/usr/lib/x86_64-linux-gnu'

Please describe the system on which you are running

  • Operating system/version: Debian 12.4
  • Computer hardware: x86_64
  • Network type: Single node

Details of the problem

oshmem_max_reduction.c works as provided in the examples directory. However, using the more recent API, replacing

shmem_long_max_to_all(dst, src, N, 0, 0, num_pes, pWrk, pSync);

with

shmem_long_max_reduce(SHMEM_TEAM_WORLD, dst, src, N);

fails with the message

[shmem_reduce.c:473:pshmem_long_max_reduce] Internal error is appeared rc = -7
@wenduwan
Copy link
Contributor

@janjust I see --with-ucx - guess you would be interested 😄

@wenduwan
Copy link
Contributor

Added main label assuming oshmem is the same with v5.0.x

@roiedanino
Copy link
Contributor

roiedanino commented Apr 4, 2024

It seems that the new API is not implemented yet in UCX spml module (or anywhere else):

From ucx/spml.c:1850

/* This routine is not implemented */
int mca_spml_ucx_team_reduce(shmem_team_t team, void
        *dest, const void *source, size_t nreduce, int operation, int datatype)
{
    return OSHMEM_ERR_NOT_IMPLEMENTED;
}

@MamziB Any chance I'm missing something? or it's a known TBD?

@popina1994
Copy link

I am having the same issue.
Should I use the old OpenSHMEM API or there is a way to bypass this?

@MamziB
Copy link
Contributor

MamziB commented Apr 5, 2024

@roiedanino yeah we will implement this in the future.
@popina1994 Should I use the old OpenSHMEM API or there is a way to bypass this? yes please go ahead and use the old openshmem for now. if I find a better workaround I will update here.

@gleon99
Copy link
Contributor

gleon99 commented Apr 7, 2024

@MamziB can reassign to yourself please?

@gleon99
Copy link
Contributor

gleon99 commented Apr 21, 2024

@MamziB ?

@janjust janjust assigned MamziB and unassigned brminich Apr 22, 2024
@MamziB
Copy link
Contributor

MamziB commented Apr 22, 2024

@gleon99 Sure let me assign it to myself. Thanks for reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants