Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Conv2d fails to run NHWC on cpu #21176

Open
Embed-Debuger opened this issue Jan 26, 2023 · 1 comment
Open

Conv2d fails to run NHWC on cpu #21176

Embed-Debuger opened this issue Jan 26, 2023 · 1 comment

Comments

@Embed-Debuger
Copy link

Description

On cpu devices, Conv2D does not seem to be able to run the NHWC format

Error Message

Traceback (most recent call last):
File "", line 1, in
File "/root/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/root/gejie/program/Medi-Test/Repreduce issue/mx1.9.1_conv_cb.py", line 20, in
print(result)
File "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 257, in repr
return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
File "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 2571, in asnumpy
ctypes.c_size_t(data.size)))
File "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: MXNetError: could not create a descriptor for a dilated convolution forward propagation primitive

To Reproduce

import mxnet as mx
from mxnet import nd            # Tensor模块
from mxnet.gluon import nn      # 神经网络基本结构
from mxnet.gluon.nn import Conv2D

import os
os.environ['DMLC_LOG_STACK_TRACE_DEPTH'] = "100"

def Model():
    net = nn.Sequential()
    net.add(Conv2D(channels=32, kernel_size=(5, 5), layout="NHWC"))
    net.initialize(ctx=mx.cpu())
    return net

shape = (10,32,32,3)
model = Model()
data = nd.random.uniform(-1, 1, shape, ctx=mx.cpu())
result = model(data)
print(result)

Steps to reproduce

(Paste the commands you ran that produced the error.)

Using the above code, the conv2D operator in NHWC format cannot be run on the cpu. And the error message does not clearly state that it is caused by the NHWC format, which will mislead me and make it difficult to locate.

What have you tried to solve it?

  1. We want to run Conv2d in NHWC format on a cpu device

Environment

We recommend using our script for collecting the diagnostic information with the following command
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3

Environment Information
# Paste the diagnose.py command output here
----------Python Info----------
Version      : 3.7.16
Compiler     : GCC 11.2.0
Build        : ('default', 'Jan 17 2023 22:20:44')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 22.3.1
Directory    : /root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.9.1
Directory    : /root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet
Commit hash file "/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library      : ['/root/anaconda3/envs/lib_mxnet/lib/python3.7/site-packages/mxnet/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✖ CPU_SSE4_1
✖ CPU_SSE4_2
✖ CPU_SSE4A
✖ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✖ F16C
✖ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform     : Linux-4.15.0-202-generic-x86_64-with-debian-buster-sid
system       : Linux
node         : server-d5
release      : 4.15.0-202-generic
version      : #213-Ubuntu SMP Thu Jan 5 19:19:12 UTC 2023
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping:            2
CPU MHz:             1200.184
CPU max MHz:         3500.0000
CPU min MHz:         1200.0000
BogoMIPS:            5196.97
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0-23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/mxnet, DNS: 0.0013 sec, LOAD: 1.1103 sec.
Error open Gluon Tutorial(en): http://gluon.mxnet.io, HTTP Error 404: Not Found, DNS finished in 0.0010609626770019531 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1091)>, DNS finished in 0.0011479854583740234 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0010 sec, LOAD: 0.9683 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0011 sec, LOAD: 1.4425 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.0011096000671386719 sec.
----------Environment----------
@github-actions
Copy link

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant