Introduction to Deep Learning Project Repo - Final Paper

How to Run the Code

Running the code is simplified by use of a python notebook. All that is required is to run each cell in the Final Model IR for IR data and Final Model RGB for RGB data. The training should take about 4 hours for 100 epochs for the final model and 11 and a half hours for the baseline model. The accuracies and model will be saved automatically every 10 epochs.

Model Code

PSMNet Literature Replication

A PSMNet model was developed based on the literature [1]. This was used to generate disparity maps and they were tested based on the training L1 loss and validation 3-pixel accuracy. The PSMNet architecture from [1] is shown in Figure 1.

Figure 1: PSMNet Literature Architecture

Comparison of Results

We use the 3 pixel disparity error to evaluate our models and compare them against the original PSMNet [1]performance. A comparison of each model’s total number ofparameters used, error on the RGB dataset, and error on the IR dataset can be seen in Table

Table 1: Performance Comparison

Name	Params.	RGB Error	IR Error
PSMNet	3.6 mil	6.4 %	25.9 %
Our Model	3.1 mil	6.9 %	31.2 %
v1 reduced param	2.77 mil	6.7 %	33.3 %
v2 reduced param	2.58 mil	9.7 %	36.8 %
Final model	1.77 mil	8.4 %	23.7 %

Disparity error visualization, Top row is the generated disparity map, middle row is the GT,and the last row is the error visualized on the GT

RGB

Figure 2: Better Disprity map	Figure 3: Worse Disparity Map

IR

Figure 2: Better Disprity map	Figure 3: Worse Disparity Map

Modifications to the PSMNet model in literature

Three main modification to the architecture of the model were also tested.

Less Convolutional layers
More Convolutional layers
2D and 3D asymmetric convolutions
New feature extraction Module

These modifications to the literature PSMNet model all reached a close final loss/accuracy with the Final model being the one that achieved a higher accuracy then the PSMNet architecture and leading to our decision of proposing that model for the use on IR datasets. Figures for the changes in loss and accuracy for RGB are shown below in Figure 4 and Figure 5. Figures for IR are shown in Figure 6 and 7.

Training	Validation
Figure 4: L1 Loss Experiments with RGB Images	Figure 5: 3-pixel Accuracy Experiments with RGB Images

Training	Validation
Figure 6: L1 Loss Experiments with IR Images	Figure 7: 3-pixel Accuracy Experiments with IR Images

The asymmetric convolutions idea was based on the paper "Rethinking the Inception Architecture for Computer Vision" [2]. The inception paper has shown that for example using a 3x1 convolution followed by a 1x3 convolution is equivalent to sliding a two layer network with the same receptive field as in a 3x3 convolution. This is shown in Figure 8. [2] has stated that the asymmetric convolutions are equivilant to sliding a two layer network with the same receptive field as in a 3x3 convolution. This is illustrated in Figure 8. The change to the basic block in the PSMNet architecture is shown in figure 9. 3D convolutions can be approximated by asymmetric convolutions in a similar manor as shown in figure 10.

Asymmetric Convolutions	Change in Basic Block Model Architectures
Figure 8: Mini-network replacing the 3x3 convolutions [2]	Figure 9: Comparison between the original and the modified architecture with asymmetric convolutions

Figure 10: Approximation of 3D convolution with 3 asymmetric convolutions

Final Model (SPP Module Modifications)

Using the insight gained from the aforementioned IR experiments, we redesigned the SPP module of PSMNet using residual blocks as shown in Figure 11 such that performance could be improved on IR images. The modifications described in this section, while tested primarily on IR images, may be applicable to RGB images as well. However, for the sake of this work we consider the architecture’s performance on the more challenging problem of IR disparity estimation.

Similar to PSMNet, we first perform spatial pooling at scales4×4,8×8,16×16, and32×32. Theoutputs of each spatial pooling operation are sent to a convolutional block (CB) whose architecture isprovided in Figure 12a. Specifically CB1 accepts 3 feature maps from the provided image and outputs 32 feature maps. The outputs from CB1 are passed to a series of 4 identity blocks. The design of each identity block (IB) is shown in Figure 12b. Note that the number of feature maps is unchanged by the identity block. The outputs of the identity block are passed through another set of convolutional (CB2) and identity (IB2) blocks. In the figure, CB2 accepts 32 feature maps and outputs 64 maps. The outputs from each spatial pooling branch are upsampled to a common size, concatenated, and passed through a final set of convolutional and identity modules. In Figure 10, CB3 takes in 512 feature maps and outputs 128 maps, while CB4 contains 64 filters. The final Conv layer contains 32 filters and performs a convolution with kernel size and stride both set to 1×1.

Figure 11: Modified SPP Module

Figure 12a: Convolutional Block (CB) Diagram: N, M are the number of incoming and outgoing feature maps respectively Figure 12b: Identity Block (IB) Diagram: N is the number of incoming feature maps

References

[1] Jia.-Ren Chang and Yong.-Sheng Chen (2018). Pyramid Stereo Matching Network CoRR, abs/1803.08669, http://arxiv.org/abs/1803.08669

[2] Christian Szegedy and Vincent Vanhoucke and Sergey Ioffe and Jonathon Shlens and Zbigniew Wojna (2015). Rethinking the Inception Architecture for Computer Vision CoRR, abs/1512.00567, http://arxiv.org/abs/1512.00567

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
Images		Images
Models		Models
Utils		Utils
.gitignore		.gitignore
DL_Final.pdf		DL_Final.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

Models

Models

Utils

Utils

.gitignore

.gitignore

DL_Final.pdf

DL_Final.pdf

README.md

README.md

Repository files navigation

Introduction to Deep Learning Project Repo - Final Paper

How to Run the Code

Model Code

PSMNet Literature Replication

Comparison of Results

Disparity error visualization, Top row is the generated disparity map, middle row is the GT,and the last row is the error visualized on the GT

RGB

IR

Modifications to the PSMNet model in literature

Final Model (SPP Module Modifications)

References

About

Releases

Packages

Contributors 4

Languages

loevlie/DL_Project_PSMNet

Folders and files

Latest commit

History

Repository files navigation

Introduction to Deep Learning Project Repo - Final Paper

How to Run the Code

Model Code

PSMNet Literature Replication

Comparison of Results

Disparity error visualization, Top row is the generated disparity map, middle row is the GT,and the last row is the error visualized on the GT

RGB

IR

Modifications to the PSMNet model in literature

Final Model (SPP Module Modifications)

References

About

Resources

Stars

Watchers

Forks

Languages