Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape from shading example does not work #146

Open
gerwang opened this issue Apr 15, 2019 · 4 comments
Open

Shape from shading example does not work #146

gerwang opened this issue Apr 15, 2019 · 4 comments

Comments

@gerwang
Copy link

gerwang commented Apr 15, 2019

Using Opt with LLVM 6.0.1 and CUDA 10.0 on Windows 10. Other examples work fine, but when running shape from shading exmple, OptGN method cannot converge and cost rises. OptLM reverts every iteration, so the final solution remains identical to the initial one. However, CUDA method and ceres method works fine.

Can anyone please explain why Opt fails on sfs examples? If I plan to solve sfs problem with Opt, can I? What should I be careful of?

Here is my screenshot:

Num Active Unknowns: 192162
Saving targetIntensity.png 640x480x1
Num Active Unknowns: 192162
Saving sfsInitDepth.ply 640x480x1
warning: Linking two modules of different data layouts: 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0/nvvm/libdevice/libdevice.10.bc' is '' whereas 'external' is 'e-m:w-i64:64-f80:128-n8:16:32:64-S128'

Using Opt v0.2.2
nUnknowns =     307200
nResiduals =    0 + 307200 * 6

nnz =   0 + 307200 * 26

compile time:   3.530093298803
problem plan complete
GPU memory usage: used = 2134.387501, free = 9129.612499 MB, total = 11264.000000 MB
20540.3203125

warning: Linking two modules of different data layouts: 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0/nvvm/libdevice/libdevice.10.bc' is '' whereas 'external' is 'e-m:w-i64:64-f80:128-n8:16:32:64-S128'

Using Opt v0.2.2
nUnknowns =     307200
nResiduals =    0 + 307200 * 6

nnz =   0 + 307200 * 26

compile time:   4.0615605919302
problem plan complete
GPU memory usage: used = 2178.387501, free = 9085.612499 MB, total = 11264.000000 MB
22543.029296875
Solving
//////////// (CUDA) ///////////////
0: cost: 257.445435
1: cost: 213.608902
2: cost: 188.103104
3: cost: 179.021088
4: cost: 166.140198
5: cost: 161.049637
6: cost: 155.435852
7: cost: 151.680801
8: cost: 148.055298
9: cost: 145.824509
10: cost: 142.622604
11: cost: 139.959824
12: cost: 138.808121
13: cost: 136.300308
14: cost: 135.800003
15: cost: 133.201431
16: cost: 132.738708
17: cost: 131.281891
18: cost: 130.731369
19: cost: 129.845230
20: cost: 129.145065
21: cost: 128.549637
22: cost: 127.977257
23: cost: 127.182693
24: cost: 126.826920
25: cost: 126.339607
26: cost: 125.163200
27: cost: 125.518059
28: cost: 124.546921
29: cost: 123.990372
30: cost: 123.104973
31: cost: 124.017250
32: cost: 122.694801
33: cost: 122.752823
34: cost: 122.644920
35: cost: 122.189789
36: cost: 121.589287
37: cost: 122.004257
38: cost: 121.600861
39: cost: 121.710930
40: cost: 120.587898
41: cost: 120.448708
42: cost: 120.590553
43: cost: 120.073143
44: cost: 120.073402
45: cost: 120.756142
46: cost: 120.196518
47: cost: 120.517303
48: cost: 119.762833
49: cost: 119.246719
50: cost: 119.303810
51: cost: 119.212128
52: cost: 119.010674
53: cost: 118.751297
54: cost: 118.531364
55: cost: 118.402618
56: cost: 118.250908
57: cost: 118.001869
58: cost: 117.713516
59: cost: 118.393784
final cost: 117.289375
------------------------------------------------------------
          Kernel          |   Count  |   Total   | Average
--------------------------+----------+-----------+----------
--------------------------+----------+-----------+----------
 overall                  |      1   |   85.435ms| 85.4354ms
--------------------------+----------+-----------+----------
 Precompute_Kernel        |     61   |    1.645ms|  0.0270ms
--------------------------+----------+-----------+----------
 EvalResidual             |     61   |    1.659ms|  0.0272ms
--------------------------+----------+-----------+----------
 PCGInit_Kernel1          |     60   |    1.769ms|  0.0295ms
--------------------------+----------+-----------+----------
 PCGInit_Kernel2          |     60   |    0.395ms|  0.0066ms
--------------------------+----------+-----------+----------
 PCGStep_Kernel1          |    600   |   19.071ms|  0.0318ms
--------------------------+----------+-----------+----------
 PCGStep_Kernel2          |    600   |   19.696ms|  0.0328ms
--------------------------+----------+-----------+----------
 PCGStep_Kernel3          |    600   |    9.030ms|  0.0150ms
--------------------------+----------+-----------+----------
 ApplyLinearUpdateDevice  |     60   |    0.670ms|  0.0112ms
------------------------------------------------------------
//////////// (Opt(GN)) ///////////////
cost: 128.722855 -> 216.032547
cost: 216.032547 -> 637.063538
cost: 637.063538 -> 1190.534546
cost: 1190.534546 -> 2862.151855
cost: 2862.151855 -> 5725.565430
cost: 5725.565430 -> 79272.570313
cost: 79272.570313 -> 7690031.000000
cost: 7690031.000000 -> 4845.605957
cost: 4845.605957 -> 6873.855469
cost: 6873.855469 -> 5626.886719
cost: 5626.886719 -> 4994.100586
cost: 4994.100586 -> 4817.243652
cost: 4817.243652 -> 4189.773438
cost: 4189.773438 -> 4356.413086
cost: 4356.413086 -> 3670.755615
cost: 3670.755615 -> 3303.120850
cost: 3303.120850 -> 3119.682861
cost: 3119.682861 -> 2929.272949
cost: 2929.272949 -> 2843.712158
cost: 2843.712158 -> 2726.231689
cost: 2726.231689 -> 2760.715820
cost: 2760.715820 -> 4117.896484
cost: 4117.896484 -> 6734.837402
cost: 6734.837402 -> 5628.507813
cost: 5628.507813 -> 5938.796387
cost: 5938.796387 -> 5122.277832
cost: 5122.277832 -> 4286.757324
cost: 4286.757324 -> 3848.667480
cost: 3848.667480 -> 3635.117432
cost: 3635.117432 -> 5794.979492
cost: 5794.979492 -> 5052.735352
cost: 5052.735352 -> 4391.011230
cost: 4391.011230 -> 4632.609375
cost: 4632.609375 -> 4280.142090
cost: 4280.142090 -> 4305.324219
cost: 4305.324219 -> 4506.224121
cost: 4506.224121 -> 3804.356201
cost: 3804.356201 -> 3227.698730
cost: 3227.698730 -> 4729.991211
cost: 4729.991211 -> 3761.899902
cost: 3761.899902 -> 4050.324463
cost: 4050.324463 -> 3593.263916
cost: 3593.263916 -> 3096.098145
cost: 3096.098145 -> 2992.417480
cost: 2992.417480 -> 2541.269775
cost: 2541.269775 -> 2483.953369
cost: 2483.953369 -> 2475.319580
cost: 2475.319580 -> 3673.941895
cost: 3673.941895 -> 3408.837158
cost: 3408.837158 -> 2924.858398
cost: 2924.858398 -> 3476.014893
cost: 3476.014893 -> 2906.582031
cost: 2906.582031 -> 2642.566895
cost: 2642.566895 -> 2538.921875
cost: 2538.921875 -> 2947.586914
cost: 2947.586914 -> 3913.861816
cost: 3913.861816 -> 4207.120117
cost: 4207.120117 -> 4262.522949
cost: 4262.522949 -> 3609.466553
cost: 3609.466553 -> 5365.645508
final cost=5365.645508
--------------------------------------------------------
        Kernel        |   Count  |   Total   | Average
----------------------+----------+-----------+----------
----------------------+----------+-----------+----------
 overall              |      1   | 2259.095ms| 2259.0955ms
----------------------+----------+-----------+----------
 precompute_W_H       |     61   |   18.208ms|  0.2985ms
----------------------+----------+-----------+----------
 computeCost_W_H      |     61   |   27.522ms|  0.4512ms
----------------------+----------+-----------+----------
 PCGInit1_W_H         |     60   |  108.960ms|  1.8160ms
----------------------+----------+-----------+----------
 PCGStep1_W_H         |    600   |  768.302ms|  1.2805ms
----------------------+----------+-----------+----------
 PCGStep2_W_H         |    600   |  632.666ms|  1.0544ms
----------------------+----------+-----------+----------
 PCGStep3_W_H         |    600   |  620.665ms|  1.0344ms
----------------------+----------+-----------+----------
 PCGLinearUpdate_W_H  |     60   |   59.117ms|  0.9853ms
--------------------------------------------------------
TIMING 2259.095459 108.960411 768.302185
Per-iter times ms (nonlinear,linear): 168.0771  2021.6331
//////////// (Opt(LM)) ///////////////
zeta=-1.0846900295291562e-005, breaking at iteration: 8
 cost=128.722763
 model_cost=22701960.000000
 model_cost_change=-22701832.000000
 trust_region_radius=5000.000000
REVERT
zeta=-4.9547274102224037e-005, breaking at iteration: 8
 cost=128.722763
 model_cost=22701956.000000
 model_cost_change=-22701828.000000
 trust_region_radius=1250.000000
REVERT
zeta=-2.4599610696895979e-005, breaking at iteration: 8
 cost=128.722763
 model_cost=22701836.000000
 model_cost_change=-22701708.000000
 trust_region_radius=156.250000
REVERT
zeta=-7.2155357884184923e-006, breaking at iteration: 8
 cost=128.722763
 model_cost=22701606.000000
 model_cost_change=-22701478.000000
 trust_region_radius=9.765625
REVERT
 cost=128.722763
 model_cost=22715304.000000
 model_cost_change=-22715176.000000
 trust_region_radius=0.305176
REVERT
zeta=-0.7424812912940979, breaking at iteration: 10
 cost=128.722763
 model_cost=22708156.000000
 model_cost_change=-22708028.000000
 trust_region_radius=0.004768
REVERT
zeta=1.5581874322379008e-005, breaking at iteration: 2
 cost=128.722763
 model_cost=22708148.000000
 model_cost_change=-22708020.000000
 trust_region_radius=0.000037
REVERT
zeta=-8.0743831176732783e-007, breaking at iteration: 2
 cost=128.722763
 model_cost=22708148.000000
 model_cost_change=-22708020.000000
 trust_region_radius=0.000000
REVERT
zeta=1.130367877522076e-006, breaking at iteration: 2
 cost=128.722763
 model_cost=22708144.000000
 model_cost_change=-22708016.000000
 trust_region_radius=0.000000
REVERT
zeta=-3.2296242125084973e-007, breaking at iteration: 2
 cost=128.722763
 model_cost=22708156.000000
 model_cost_change=-22708028.000000
 trust_region_radius=0.000000
REVERT
zeta=0, breaking at iteration: 2
 cost=128.722763
 model_cost=22708150.000000
 model_cost_change=-22708022.000000
 trust_region_radius=0.000000
REVERT
zeta=9.6888697953545488e-007, breaking at iteration: 2
 cost=128.722763
 model_cost=22708148.000000
 model_cost_change=-22708020.000000
 trust_region_radius=0.000000
REVERT
zeta=1.6148122483627958e-007, breaking at iteration: 2
 cost=128.722763
 model_cost=22708158.000000
 model_cost_change=-22708030.000000
 trust_region_radius=0.000000
REVERT
zeta=4.8444343292430858e-007, breaking at iteration: 2
 cost=128.722763
 model_cost=22708144.000000
 model_cost_change=-22708016.000000
 trust_region_radius=0.000000
REVERT
zeta=-3.2296227914230258e-007, breaking at iteration: 2
 cost=128.722763
 model_cost=22708150.000000
 model_cost_change=-22708022.000000
 trust_region_radius=0.000000

Trust_region_radius is less than the min, exiting
final cost=128.722763
--------------------------------------------------------
        Kernel        |   Count  |   Total   | Average
----------------------+----------+-----------+----------
----------------------+----------+-----------+----------
 overall              |      1   |  419.032ms| 419.0315ms
----------------------+----------+-----------+----------
 precompute_W_H       |     30   |    9.266ms|  0.3089ms
----------------------+----------+-----------+----------
 computeCost_W_H      |     16   |    7.290ms|  0.4556ms
----------------------+----------+-----------+----------
 PCGInit1_W_H         |     15   |   32.349ms|  2.1566ms
----------------------+----------+-----------+----------
 PCGSaveSSq_W_H       |      1   |    1.024ms|  1.0242ms
----------------------+----------+-----------+----------
 PCGComputeCtC_W_H    |     15   |   19.528ms|  1.3019ms
----------------------+----------+-----------+----------
 PCGFinalizeDiagonal_W_H |     15   |   17.929ms|  1.1953ms
----------------------+----------+-----------+----------
 PCGStep1_W_H         |     70   |   92.248ms|  1.3178ms
----------------------+----------+-----------+----------
 PCGStep2_W_H         |     68   |   72.828ms|  1.0710ms
----------------------+----------+-----------+----------
 PCGStep3_W_H         |     70   |   71.884ms|  1.0269ms
----------------------+----------+-----------+----------
 computeModelCost_W_H |     15   |    6.438ms|  0.4292ms
----------------------+----------+-----------+----------
 savePreviousUnknowns_W_H |     15   |   15.215ms|  1.0143ms
----------------------+----------+-----------+----------
 PCGLinearUpdate_W_H  |     15   |   15.320ms|  1.0213ms
----------------------+----------+-----------+----------
 revertUpdate_W_H     |     15   |   15.249ms|  1.0166ms
----------------------+----------+-----------+----------
 PCGStep2_1stHalf_W_H |      2   |    2.044ms|  1.0222ms
----------------------+----------+-----------+----------
 computeAdelta_W_H    |      2   |    2.636ms|  1.3178ms
----------------------+----------+-----------+----------
 PCGStep2_2ndHalf_W_H |      2   |    2.132ms|  1.0660ms
--------------------------------------------------------
TIMING 419.031525 32.349342 92.247841
Per-iter times ms (nonlinear,linear): 122.0285  164.1317
===Shape From Shading===
**Final Costs**
Opt GN,Opt LM,CERES
5.36564550781250000000e+03,1.28722763061523437500e+02,
Solved
About to save
Saving sfsOutput 640x480x1
Saving sfsOutput.ply 640x480x1
Save
GPU memory usage: used = 2159.262501, free = 9104.737499 MB, total = 11264.000000 MB
plan free complete
GPU memory usage: used = 2159.262501, free = 9104.737499 MB, total = 11264.000000 MB
GPU memory usage: used = 2115.262501, free = 9148.737499 MB, total = 11264.000000 MB
plan free complete
GPU memory usage: used = 2115.262501, free = 9148.737499 MB, total = 11264.000000 MB
@gerwang gerwang changed the title shape from shading example do not work Shape from shading example does not work Apr 15, 2019
@gerwang
Copy link
Author

gerwang commented Apr 30, 2019

I found that the number of linear iterations and whether to use preconditioner means a lot to the convergence. Reducing nLinearIterations from 10 to 3 and disable preconditioner can make Opt work on the sfs example.

@Mx7f
Copy link
Collaborator

Mx7f commented Apr 30, 2019

What version of terra are you using?

This is a problem, the default parameters should work fine. For the LM solver, what happens if you set the residual_reset_period to 1?

@gerwang
Copy link
Author

gerwang commented May 1, 2019

Thanks for your reply. My terra is cloned from its github master branch, built from source with LLVM 6.0.1 and VS2015. I tried setting residual_reset_period to 1, but OptLM still reverts every iteration.

    virtual void combinedSolveInit() override {
        m_solverParams.set("nIterations", &m_combinedSolverParameters.nonLinearIter);
        m_solverParams.set("lIterations", &m_combinedSolverParameters.linearIter);
        m_solverParams.set("residual_reset_period", &m_combinedSolverParameters.residual_reset_period);
    }

...

struct CombinedSolverParameters {
    bool useCUDA = false;
    bool useOpt = true;
    bool useOptLM = false;
    bool useCeres = false;
    bool earlyOut = false;
    unsigned int numIter = 1;
    unsigned int nonLinearIter = 3;
    unsigned int linearIter = 200;
    unsigned int patchIter = 32;
    bool profileSolve = true;
    bool optDoublePrecision = false;
    float residual_reset_period = 1;
};

And here is the output.

Saving targetDepth 640x480x1
Num Active Unknowns: 192162
1 weight
100 fit
100 reg
Saving targetIntensity.png 640x480x1
Saving targetDepth 640x480x1
Saving maskEdgeMap.png 640x960x1
Saving maskEdgeMap 640x960x1
Num Active Unknowns: 192162
1 weight
100 fit
100 reg
Saving sfsInitDepth.ply 640x480x1
warning: Linking two modules of different data layouts: 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0/nvvm/libdevice/libdevice.10.bc' is '' whereas 'external' is 'e-m:w-i64:64-f80:128-n8:16:32:64-S128'

Using Opt v0.2.2
nUnknowns =     307200
nResiduals =    0 + 307200 * 6

nnz =   0 + 307200 * 26

compile time:   4.1668634786038
problem plan complete
GPU memory usage: used = 2142.637501, free = 9121.362499 MB, total = 11264.000000 MB
22524.102539063
Solving
//////////// (Opt(LM)) ///////////////
 cost=175.765216
 model_cost=22708265.595070
 model_cost_change=-22708089.829853
 trust_region_radius=5000.000000
REVERT
 cost=175.765216
 model_cost=22708265.612590
 model_cost_change=-22708089.847374
 trust_region_radius=1250.000000
REVERT
 cost=175.765216
 model_cost=22708265.717455
 model_cost_change=-22708089.952239
 trust_region_radius=156.250000
REVERT
 cost=175.765216
 model_cost=22708266.675104
 model_cost_change=-22708090.909887
 trust_region_radius=9.765625
REVERT
 cost=175.765216
 model_cost=22708278.340151
 model_cost_change=-22708102.574935
 trust_region_radius=0.305176
REVERT
 cost=175.765216
 model_cost=22708241.150293
 model_cost_change=-22708065.385077
 trust_region_radius=0.004768
REVERT
zeta=1.6423318817270691e-005, breaking at iteration: 2
 cost=175.765216
 model_cost=22708195.599687
 model_cost_change=-22708019.834471
 trust_region_radius=0.000037
REVERT
zeta=1.0141227687073646e-009, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.554962
 model_cost_change=-22708018.789745
 trust_region_radius=0.000000
REVERT
zeta=1.4936715456057726e-014, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546709
 model_cost_change=-22708018.781492
 trust_region_radius=0.000000
REVERT
zeta=-3.6332546053598101e-015, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546677
 model_cost_change=-22708018.781460
 trust_region_radius=0.000000
REVERT
zeta=0, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546676
 model_cost_change=-22708018.781460
 trust_region_radius=0.000000
REVERT
zeta=-4.0369495604115869e-016, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546676
 model_cost_change=-22708018.781460
 trust_region_radius=0.000000
REVERT
zeta=0, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546676
 model_cost_change=-22708018.781460
 trust_region_radius=0.000000
REVERT
zeta=4.036949560411584e-016, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546676
 model_cost_change=-22708018.781460
 trust_region_radius=0.000000
REVERT
zeta=1.2110848681234751e-015, breaking at iteration: 2
 cost=175.765216
 model_cost=22708194.546676
 model_cost_change=-22708018.781460
 trust_region_radius=0.000000

Trust_region_radius is less than the min, exiting
final cost=175.765216
--------------------------------------------------------
        Kernel        |   Count  |   Total   | Average
----------------------+----------+-----------+----------
----------------------+----------+-----------+----------
 overall              |      1   |  313.555ms| 313.5549ms
----------------------+----------+-----------+----------
 precompute_W_H       |     30   |   26.998ms|  0.8999ms
----------------------+----------+-----------+----------
 computeCost_W_H      |     16   |    6.909ms|  0.4318ms
----------------------+----------+-----------+----------
 PCGInit1_W_H         |     15   |   31.470ms|  2.0980ms
----------------------+----------+-----------+----------
 PCGSaveSSq_W_H       |      1   |    1.077ms|  1.0773ms
----------------------+----------+-----------+----------
 PCGComputeCtC_W_H    |     15   |   16.487ms|  1.0991ms
----------------------+----------+-----------+----------
 PCGFinalizeDiagonal_W_H |     15   |   16.556ms|  1.1037ms
----------------------+----------+-----------+----------
 PCGStep1_W_H         |     36   |   41.774ms|  1.1604ms
----------------------+----------+-----------+----------
 PCGStep2_W_H         |     36   |   39.746ms|  1.1041ms
----------------------+----------+-----------+----------
 PCGStep3_W_H         |     36   |   37.802ms|  1.0501ms
----------------------+----------+-----------+----------
 computeModelCost_W_H |     15   |    7.849ms|  0.5233ms
----------------------+----------+-----------+----------
 savePreviousUnknowns_W_H |     15   |   15.594ms|  1.0396ms
----------------------+----------+-----------+----------
 PCGLinearUpdate_W_H  |     15   |   15.675ms|  1.0450ms
----------------------+----------+-----------+----------
 revertUpdate_W_H     |     15   |   15.611ms|  1.0407ms
--------------------------------------------------------
TIMING 313.554932 31.469507 41.774025
Per-iter times ms (nonlinear,linear): 119.2413  119.3221
===Shape From Shading===
**Final Costs**
Opt GN,Opt LM,CERES
,1.75765216273377006928e+02,
Solved
About to save
Saving sfsOutput 640x480x1
Saving sfsOutput.ply 640x480x1
Save
GPU memory usage: used = 2117.512501, free = 9146.487499 MB, total = 11264.000000 MB
plan free complete
GPU memory usage: used = 2117.512501, free = 9146.487499 MB, total = 11264.000000 MB

@gerwang
Copy link
Author

gerwang commented May 1, 2019

Also I don't understand why when I enable preconditioner or increase nLinearIteration, the loss increases and solution drifts. Is there anything underlining mathematical principle of PCG algorithm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants