Skip to content

Conversation

@fengyuentau
Copy link
Member

@fengyuentau fengyuentau commented Aug 2, 2024

Merge wtih opencv/opencv_extra#1198.
Merge with opencv/opencv_contrib#3787.

Perf:

M2 (16GB ram, with fp16 vector intrinsics support)

Geometric mean (ms)

                                 Name of Test                                   base  patch   patch   
                                                                                                vs    
                                                                                               base   
                                                                                            (x-factor)
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1)      0.276 0.102    2.71   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC1)     0.338 0.137    2.46   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC1)     0.293 0.116    2.54   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC3)      0.447 0.140    3.19   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC3)     0.525 0.287    1.83   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC3)     0.391 0.258    1.52   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4)      0.405 0.155    2.62   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC4)     0.613 0.348    1.76   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC4)     0.420 0.317    1.33   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1)     0.274 0.095    2.89   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC1)    0.330 0.141    2.35   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC1)    0.294 0.167    1.76   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC3)     0.439 0.135    3.26   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC3)    0.528 0.284    1.86   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC3)    0.382 0.266    1.43   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4)     0.411 0.162    2.53   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC4)    0.624 0.353    1.77   
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC4)    0.423 0.344    1.23   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC1)     0.516 0.178    2.90   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC1)    0.563 0.262    2.15   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC1)    0.516 0.219    2.36   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC3)     0.838 0.286    2.93   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC3)    0.902 0.552    1.63   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC3)    0.738 0.523    1.41   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC4)     0.825 0.305    2.70   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC4)    1.050 0.677    1.55   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC4)    0.815 0.649    1.26   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC1)    0.719 0.616    1.17   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC1)   0.750 0.700    1.07   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC1)   0.677 0.652    1.04   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC3)    1.167 0.737    1.58   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC3)   1.172 1.056    1.11   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC3)   0.907 0.989    0.92   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC4)    1.242 0.793    1.57   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC4)   1.419 1.223    1.16   
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC4)   1.013 1.136    0.89   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    1.024 0.285    3.60   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   1.020 0.365    2.79   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   1.002 0.329    3.05   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    1.641 0.384    4.28   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   1.704 0.714    2.39   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   1.450 0.717    2.02   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    1.652 0.422    3.92   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   1.867 0.870    2.15   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   1.680 1.049    1.60   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   1.734 2.364    0.73   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  1.661 2.484    0.67   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  1.567 2.461    0.64   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   2.781 2.635    1.06   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  2.545 3.059    0.83   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  2.086 2.983    0.70   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   3.061 2.799    1.09   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  3.050 3.449    0.88   
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  2.373 3.222    0.74   

Intel i7-12700K

Geometric mean (ms)

                                 Name of Test                                   base  patch   patch
                                                                                                vs
                                                                                               base
                                                                                            (x-factor)
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1)      0.149 0.047    3.17
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC1)     0.181 0.046    3.96
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC1)     0.163 0.036    4.50
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC3)      0.196 0.115    1.70
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC3)     0.374 0.107    3.48
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC3)     0.228 0.108    2.11
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4)      0.180 0.146    1.23
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC4)     0.446 0.141    3.17
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC4)     0.298 0.140    2.13
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1)     0.144 0.047    3.08
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC1)    0.182 0.046    3.97
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC1)    0.162 0.036    4.50
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC3)     0.196 0.117    1.67
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC3)    0.366 0.108    3.40
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC3)    0.227 0.105    2.15
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4)     0.180 0.148    1.22
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC4)    0.446 0.141    3.16
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC4)    0.300 0.148    2.03
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC1)     0.314 0.105    2.99
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC1)    0.377 0.103    3.65
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC1)    0.353 0.085    4.15
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC3)     0.401 0.241    1.66
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC3)    0.617 0.231    2.67
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC3)    0.866 0.230    3.77
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC4)     0.373 0.301    1.24
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC4)    1.087 0.344    3.17
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC4)    0.763 0.300    2.55
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC1)    0.406 0.244    1.66
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC1)   0.435 0.245    1.77
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC1)   0.401 0.230    1.74
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC3)    0.561 0.418    1.34
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC3)   0.759 0.400    1.90
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC3)   0.572 0.395    1.45
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC4)    0.590 0.484    1.22
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC4)   0.933 0.477    1.95
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC4)   0.669 0.462    1.45
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    0.547 0.165    3.32
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   0.554 0.163    3.39
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   0.547 0.143    3.82
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    0.629 0.322    1.96
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   0.847 0.308    2.75
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   2.616 0.333    7.86
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    0.606 0.390    1.55
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   2.548 0.380    6.71
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   1.766 0.866    2.04
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   0.847 0.880    0.96
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  0.885 0.862    1.03
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  0.759 0.829    0.92
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   1.384 1.239    1.12
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  1.594 1.169    1.36
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  1.375 1.126    1.22
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   1.546 1.272    1.22
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  1.909 1.260    1.51
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  1.396 1.317    1.06

Khadas VIM3 (A311D, 4xA73+2xA53, no fp16 vector intrinsics support)

Geometric mean (ms)

                                 Name of Test                                    base  patch    patch
                                                                                                  vs
                                                                                                 base
                                                                                              (x-factor)
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1)      2.052  0.579     3.54
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC1)     3.237  0.523     6.20
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC1)     2.603  0.592     4.40
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC3)      4.153  1.048     3.96
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC3)     5.225  1.327     3.94
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC3)     3.388  1.665     2.03
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4)      3.746  1.353     2.77
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 16UC4)     6.727  1.784     3.77
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_CONSTANT, 32FC4)     4.006  2.308     1.74
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1)     1.875  0.578     3.24
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC1)    3.046  0.587     5.19
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC1)    2.577  0.533     4.83
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC3)     4.072  1.041     3.91
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC3)    4.907  1.327     3.70
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC3)    3.512  1.656     2.12
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4)     3.689  1.361     2.71
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 16UC4)    6.403  1.795     3.57
WarpAffine::TestWarpAffine::(640x480, INTER_LINEAR, BORDER_REPLICATE, 32FC4)    4.335  2.317     1.87
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC1)     2.869  1.249     2.30
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC1)    4.047  1.274     3.18
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC1)    3.358  1.770     1.90
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC3)     5.454  2.365     2.31
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC3)    6.603  2.949     2.24
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC3)    6.177  4.540     1.36
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 8UC4)     6.166  2.988     2.06
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 16UC4)    7.685  3.993     1.92
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_CONSTANT, 32FC4)    7.637  6.011     1.27
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC1)    4.427  2.940     1.51
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC1)   6.068  2.981     2.04
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC1)   4.316  2.946     1.47
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC3)    7.169  5.075     1.41
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC3)   8.606  5.576     1.54
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC3)   7.533  5.995     1.26
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 8UC4)    8.751  5.985     1.46
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 16UC4)   15.192 6.713     2.26
WarpAffine::TestWarpAffine::(1280x720, INTER_LINEAR, BORDER_REPLICATE, 32FC4)   7.056  7.726     0.91
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    8.686  2.083     4.17
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   10.927 2.132     5.13
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   7.764  3.213     2.42
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    14.588 3.600     4.05
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   17.251 4.896     3.52
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   15.349 8.764     1.75
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    14.623 4.480     3.26
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   19.076 6.394     2.98
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   18.237 10.422    1.75
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   12.835 10.112    1.27
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  17.167 10.208    1.68
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  12.486 9.994     1.25
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   22.759 16.134    1.41
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  29.616 16.531    1.79
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  22.666 16.851    1.35
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   27.193 18.530    1.47
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  33.831 18.593    1.82
WarpAffine::TestWarpAffine::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  20.932 19.754    1.06

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux OpenCL

@fengyuentau fengyuentau added this to the 5.0 milestone Aug 2, 2024
@fengyuentau fengyuentau force-pushed the imgproc/warpaffine_opt branch 3 times, most recently from a3f2942 to bb00632 Compare August 3, 2024 12:53
@fengyuentau fengyuentau force-pushed the imgproc/warpaffine_opt branch from 4a641e1 to 3fb43dd Compare August 6, 2024 06:34
@asmorkalov

This comment was marked as resolved.

@fengyuentau fengyuentau force-pushed the imgproc/warpaffine_opt branch from 0af05d4 to 7abaafe Compare August 7, 2024 03:39
@asmorkalov

This comment was marked as outdated.

@fengyuentau

This comment was marked as outdated.

@fengyuentau
Copy link
Member Author

Decided to drop warpPerspective kernels from this PR for now. Will go working on adding kernels for 8UC1/C4 and 32FC1/C3/C4. Will consider 16U as well.

@asmorkalov

This comment was marked as resolved.

@asmorkalov

This comment was marked as resolved.

@fengyuentau

This comment was marked as resolved.

@fengyuentau

This comment was marked as resolved.

@fengyuentau
Copy link
Member Author

Not sure why OCL tests failed since this PR should touch nothing about OCL.

Example: https://github.com/opencv/opencv/actions/runs/10557877242/job/29246289055?pr=25984

@asmorkalov

This comment was marked as resolved.

Comment on lines 2272 to 2274
#if CV_NEON_AARCH64
return v_int32x4(vcvtmq_s32_f32(a.val));
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggled at armv7 support. There is no existing macros indicating if target supports only armv7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is manual "CAROTENE_NEON_ARCH" option. Looks like we need build check or handle march option correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need build check or handle march option correctly.

That sounds like a dedicated PR. How about guard it (as well as the above one) with CV_NEON_AARCH64 for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use __ARM_ARCH > 7 check. I have armv7 board and test it regularly.

@asmorkalov
Copy link
Contributor

I propose to backport TS related changes and rounding/floor optimization to 4.x to reduce merge conflicts and bring the global optimization too.

@asmorkalov
Copy link
Contributor

@fengyuentau I benchmarked your code (couple of commits ago). See details in archive. There are single thread benchmarks also. The patch looks good for ARM, but there are some stable regressions on x86. Please take a look.
perf-warpAffine.zip

@fengyuentau fengyuentau force-pushed the imgproc/warpaffine_opt branch from 2e30217 to abdaeab Compare September 12, 2024 08:25
@fengyuentau
Copy link
Member Author

There is strange degradation on Jetson Orin for 32FC3 and 32FC4 and 4K resolution. It's not visible on other platforms, so I suspect it's some cache effect. Could you check it with Mac?

@asmorkalov Performance degradation happens on BORDER_REPLICATE.

WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 8UC1)    3.694  0.955     3.87   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 16UC1)   3.393  1.006     3.37   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 32FC1)   3.366  0.993     3.39   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 8UC3)    5.894  1.063     5.54   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 16UC3)   5.409  1.500     3.61   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 32FC3)   5.591  2.310     2.42   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 8UC4)    6.274  1.419     4.42   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 16UC4)   5.935  2.024     2.93   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_CONSTANT, 32FC4)   5.916  3.704     1.60   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 8UC1)   7.342  12.895    0.57   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 16UC1)  6.965  12.967    0.54   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 32FC1)  6.318  12.725    0.50   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 8UC3)   11.902 14.625    0.81   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 16UC3)  10.698 15.114    0.71   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 32FC3)  8.697  14.357    0.61   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 8UC4)   13.588 15.516    0.88   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 16UC4)  12.654 16.278    0.78   
WarpAffine::TestWarpAffine::(3840x2160, INTER_LINEAR, BORDER_REPLICATE, 32FC4)  9.618  15.251    0.63   

@fengyuentau
Copy link
Member Author

@vpisarev @asmorkalov I updated code with the use of algo hint.

@fengyuentau
Copy link
Member Author

@asmorkalov @vpisarev Could you review this PR? I already proceeded with this branch and implemented the new warpPerspective kernels. Want to create a new PR for that.

const Size srcSize = get<0>(params);
const int type = get<1>(params), interpolation = get<2>(params);
const double eps = CV_MAT_DEPTH(type) <= CV_32S ? 1 : interpolation == INTER_CUBIC ? 2e-3 : 1e-4;
const double eps = CV_MAT_DEPTH(type) <= CV_32S ? 2 : interpolation == INTER_CUBIC ? 2e-3 : 3e-2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the change does not make sense as soon as you introduced AlgorithmHint. I reverted the change and no not see regressions with Intel OpenCL (iGPU) and NVIDIA GF 1080.

{
for (int j = 0; j < test_loop_times; j++)
{
double eps = depth < CV_32F ? 0.04 : 0.06;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to convert it to conditions with explicit types. We have fp16, int64, bool that comes after fp64. Also it's hard to understand the condition.

{
for (int j = 0; j < test_loop_times; j++)
{
double eps = depth < CV_32F ? 0.04 : 0.06;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same proposal here.

@asmorkalov asmorkalov modified the milestones: 5.0, 5.0-alpha Sep 27, 2024
@asmorkalov asmorkalov self-assigned this Oct 1, 2024
@vpisarev vpisarev self-requested a review October 1, 2024 13:48
@asmorkalov asmorkalov merged commit 97681bd into opencv:5.x Oct 3, 2024
26 of 27 checks passed
asmorkalov pushed a commit to opencv/opencv_contrib that referenced this pull request Oct 4, 2024
slightly alter threshold for warpAffine optimization #3787

Merge with opencv/opencv#25984

New`onfusionMatrixes[1]` is

```
[[45  0  0  0  0  0  0  0  0  0]
 [ 0 57  0  0  0  0  0  0  0  0]
 [ 0  0 58  2  0  0  0  0  1  0]
 [ 0  0  0 43  0  0  0  1  0  0]
 [ 0  0  0  0 39  0  0  0  0  1]
 [ 0  0  0  1  0 49  0  0  1  0]
 [ 0  0  0  0  0  0 52  0  0  0]
 [ 0  0  1  0  0  0  0 54  0  0]
 [ 0  0  0  0  0  0  0  0 47  0]
 [ 0  1  0  1  0  0  0  0  2 44]]
```
which is about of pixel value 1 shift in each 4x or 5x pixel value.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
@fengyuentau fengyuentau deleted the imgproc/warpaffine_opt branch October 9, 2024 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants