Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some backprjections/FDK result in an unexpected CUDA error #492

Open
AnderBiguri opened this issue Sep 22, 2023 · 6 comments
Open

Some backprjections/FDK result in an unexpected CUDA error #492

AnderBiguri opened this issue Sep 22, 2023 · 6 comments

Comments

@AnderBiguri
Copy link
Member

Still unsure which geos cause this, but I get "main loop fail" for some geos. If it is a prooblem of the geo being wrong, this should be caugth earlier, otherwise there is a mayor bug somewhere. I wonder in the sizes/ints are being properly passed since the latest changes

@tsadakane
Copy link
Contributor

Could this have something to do with e7ad230 ?

@AnderBiguri
Copy link
Member Author

@tsadakane possibly, that is what I thought, but my preliminary test don't seem to show any mayor change. I just need some time to sit down and print the right things.

@AnderBiguri
Copy link
Member Author

Hi @tsadakane I wonder if this has to do with the particulars of the computer I am running this at, a 4 GPU station.
Playing with the GpuId() class, I realize that we do have a way to select al GPUs with the same name, but can we actually just select a GPU given its Id? I think not, or my brain is a bit too tired today to figure out how. Am I too tired, or we can't do this?

@AnderBiguri
Copy link
Member Author

AnderBiguri commented Oct 4, 2023

I have more or less found the issue, and it resides in the difference between these two enviroments. I need to check what exactly is the breaking one (for tomorrow....)

Working:

#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.12.0                   pypi_0    pypi
cython                    3.0.2                    pypi_0    pypi
fonttools                 4.43.0                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
libnsl                    2.0.0                hd590300_1    conda-forge
libsqlite                 3.43.0               h2797004_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
matplotlib                3.8.0                    pypi_0    pypi
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.26.0                   pypi_0    pypi
openssl                   3.1.3                hd590300_0    conda-forge
packaging                 23.2                     pypi_0    pypi
pillow                    10.0.1                   pypi_0    pypi
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.1                    pypi_0    pypi
python                    3.11.3          h2755cc3_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
pytigre                   2.4.0                    pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
scipy                     1.11.3                   pypi_0    pypi
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
tk                        8.6.13               h2797004_0    conda-forge
tqdm                      4.66.1                   pypi_0    pypi
tzdata                    2023c                h71feb2d_0    conda-forge
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge

Not working:

#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
alsa-lib                  1.2.8                h166bdaf_0    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
blas                      1.0                         mkl    conda-forge
brotli                    1.0.9                h166bdaf_9    conda-forge
brotli-bin                1.0.9                h166bdaf_9    conda-forge
brotli-python             1.0.9           py310hd8f1fbe_9    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
cairo                     1.16.0            ha61ee94_1014    conda-forge
certifi                   2023.7.22          pyhd8ed1ab_0    conda-forge
charset-normalizer        3.2.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.1.0                    pypi_0    pypi
cuda-cudart               11.7.99                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
cycler                    0.11.0                   pypi_0    pypi
cython                    0.29.28         py310hd8f1fbe_2    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
fftw                      3.3.10          nompi_hc118613_108    conda-forge
filelock                  3.12.4             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.42.0                   pypi_0    pypi
freetype                  2.12.1               h267a509_2    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
glib                      2.78.0               hfc55251_0    conda-forge
glib-tools                2.78.0               hfc55251_0    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gmpy2                     2.1.2           py310h3ec546c_1    conda-forge
gnutls                    3.6.13               h85f3911_1    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gst-plugins-base          1.22.0               h4243ec0_2    conda-forge
gstreamer                 1.22.0               h25f0c4b_2    conda-forge
gstreamer-orc             0.4.34               hd590300_0    conda-forge
harfbuzz                  6.0.0                h8e241bc_0    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
jack                      1.9.22               h11f4161_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4                    pypi_0    pypi
krb5                      1.20.1               h81ceb04_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.15                 hfd0df8a_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0            16_linux64_mkl    conda-forge
libbrotlicommon           1.0.9                h166bdaf_9    conda-forge
libbrotlidec              1.0.9                h166bdaf_9    conda-forge
libbrotlienc              1.0.9                h166bdaf_9    conda-forge
libcap                    2.67                 he9d0100_0    conda-forge
libcblas                  3.9.0            16_linux64_mkl    conda-forge
libclang                  15.0.7          default_h7634d5b_3    conda-forge
libclang13                15.0.7          default_h9986a30_3    conda-forge
libcublas                 11.10.3.66                    0    nvidia
libcufft                  10.7.2.124           h4fbf590_0    nvidia
libcufile                 1.7.2.10                      0    nvidia
libcups                   2.3.3                h36d4200_3    conda-forge
libcurand                 10.3.3.141                    0    nvidia
libcusolver               11.4.0.1                      0    nvidia
libcusparse               11.7.4.91                     0    nvidia
libdb                     6.2.32               h9c3ff4c_0    conda-forge
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.3                h59595ed_0    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgcrypt                 1.10.1               h166bdaf_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.78.0               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgpg-error              1.47                 h71f35ed_0    conda-forge
libhwloc                  2.9.1                hd6dc26d_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
liblapack                 3.9.0            16_linux64_mkl    conda-forge
libllvm15                 15.0.7               hadd5161_1    conda-forge
libnpp                    11.7.4.75                     0    nvidia
libnsl                    2.0.0                h7f98852_0    conda-forge
libnvjpeg                 11.8.0.2                      0    nvidia
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     15.3                 hbcd7760_1    conda-forge
libsndfile                1.2.2                hbc2eb40_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libsystemd0               253                  h8c4010b_1    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libtool                   2.4.7                h27087fc_0    conda-forge
libudev1                  253                  h0b41bf4_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.5.0                h79f4944_1    conda-forge
libxml2                   2.10.3               hca2bb57_4    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvm-openmp               16.0.6               h4dfa4b3_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markupsafe                2.1.3           py310h2372a71_0    conda-forge
matplotlib                3.7.2                    pypi_0    pypi
matplotlib-base           3.8.0           py310h62c0568_0    conda-forge
mkl                       2022.2.1         h84fe81f_16997    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.0                hb012696_0    conda-forge
mpg123                    1.31.3               hcb278e6_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.33               hf1915f5_4    conda-forge
mysql-libs                8.0.33               hca2cd23_4    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
nettle                    3.6                  he412f7d_0    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.92                 h1d7d5a4_0    conda-forge
numpy                     1.26.0          py310hb13e2d6_0    conda-forge
openh264                  2.1.1                h780b84a_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.3                hd590300_0    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pathlib                   1.0.1           py310hff52083_7    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    10.0.0                   pypi_0    pypi
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.10.0             pyhd8ed1ab_0    conda-forge
ply                       3.11                       py_1    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pulseaudio                16.1                 hcb278e6_3    conda-forge
pulseaudio-client         16.1                 h5195f5e_3    conda-forge
pulseaudio-daemon         16.1                 ha8d29e2_3    conda-forge
pyparsing                 3.0.9                    pypi_0    pypi
pyqt                      5.15.9          py310h04931ad_4    conda-forge
pyqt5-sip                 12.12.2         py310hc6cd4ac_4    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.12         hd12c33a_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python_abi                3.10                    3_cp310    conda-forge
pytigre                   2.4.0                    pypi_0    pypi
pytorch                   2.0.1           py3.10_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pyyaml                    6.0             py310h5764c6d_5    conda-forge
qt-main                   5.15.8               h5d23da1_6    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
scipy                     1.11.1          py310ha4c1d20_0    conda-forge
setuptools                68.0.0             pyhd8ed1ab_0    conda-forge
sip                       6.7.11          py310hc6cd4ac_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sympy                     1.12            pypyh9d50eac_103    conda-forge
tbb                       2021.9.0             hf52228f_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torchaudio                2.0.2               py310_cu117    pytorch
torchtriton               2.0.0                     py310    pytorch
torchvision               0.15.2              py310_cu117    pytorch
tornado                   6.3.3           py310h2372a71_0    conda-forge
tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
typing-extensions         4.7.1                hd8ed1ab_0    conda-forge
typing_extensions         4.7.1              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   2.0.4              pyhd8ed1ab_0    conda-forge
wheel                     0.41.0             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.0                h516909a_0    conda-forge
xcb-util-image            0.4.0                h166bdaf_0    conda-forge
xcb-util-keysyms          0.4.0                h516909a_0    conda-forge
xcb-util-renderutil       0.3.9                h166bdaf_0    conda-forge
xcb-util-wm               0.4.1                h516909a_0    conda-forge
xkeyboard-config          2.38                 h0b41bf4_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.4                h0b41bf4_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xf86vidmodeproto     2.3.1             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

@tsadakane
Copy link
Contributor

tsadakane commented Oct 5, 2023

@AnderBiguri ,

we do have a way to select all GPUs with the same name,

Yes, we do.

but can we actually just select a GPU given its Id?

Yes, we can.
If the IDs of the name "XXX" were (0,1,2,3) and we want to use only ID=1, I think it is possible by setting something like this:

gpuids.devices = (int32(1))

@AnderBiguri
Copy link
Member Author

In nay case, its still nor clear to me what causes the error. I upgraded the non-working env to cython 3 and that still causes errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants