Skip to content

Latest commit

 

History

History
337 lines (255 loc) · 19.5 KB

Computer vision.md

File metadata and controls

337 lines (255 loc) · 19.5 KB

Computer Vision

Computer vision, machine vision and image processing are to process image in computer for different visual tasks from diverse perspectives. Computer vision focus on the visual representation includes images, graphics, animation and videos rather than the sounds, speech or text.

Visual recognition tasks, such as image classification, localization, and detection, are the core building blocks of many of these applications, and recent developments in Convolutional Neural Networks (CNNs) have led to outstanding performance in these state-of-the-art visual recognition tasks and systems. As a result, CNNs now form the crux of deep learning algorithms in computer vision. The Ancient Secret of Computer Vision covers standard techniques in image processing like filtering, edge detection, stereo, flow, etc. (old-school vision), as well as newer, machine-learning based computer vision, which is more comprehensive.

In this section, we will focus on more technological details of CNN architecures, training and the motivation.

See CV, NLP for the state-of-the-art methods.

Image acquisition Image processing Image analysis
Webcams & embedded cameras Edge detection 3D scene mapping
Digital compact cameras & DSLR Segmentation Object recognition
Consumer 3D cameras Classification Object tracking
Laser range finders Feature detection and matching ---

Image Classification / Recognition

Like other classification tasks, the feature engineering is the core and kernel of preprocessing. The predicted labels is always supposed to be attached to some specific features. As the fingerprint can be an biological identifier of its owner, the desired features are supposed to be sufficient to identify the differences between the ones with the same label.

LeNet

LeNet learns the parameters using error back-propagation. In another word, the optimization procedure of CNNs are based on gradient methods. This CNN model was successfully applied to recognize handwritten digits.

LeNet

As shown in the above figure, it consists of convolution, subsampling, full connection and Gaussian connections. It is a typical historical architecture.

AlexNet

AlexNet was the winning entry in ILSVRC 2012. It solves the problem of image classification where the input is an image of one of 1000 different classes (e.g. cats, dogs etc.) and the output is a vector of 1000 numbers.

AlexNet consists of 5 Convolutional Layers and 3 Fully Connected Layers. Overlapping Max Pool layers are similar to the Max Pool layers, except the adjacent windows over which the max is computed overlap each other. An important feature of the AlexNet is the use of ReLU(Rectified Linear Unit) Nonlinearity.

VGG

The very deep ConvNets were the basis of our ImageNet ILSVRC-2014 submission, where our team (VGG) secured the first and the second places in the localisation and classification tasks respectively.

Inception

ResNet

ResNet is to solve the degeneration of deep neural network when the depth of layers of deep neural network increase. It is guessed that the nonlinearity of activation function makes it tough to learn (linear) identity transformation (Id(x)=x).


graphcore.ai

DenseNet

DenseNet

HRNet

The high-resolution network (HRNet) maintains high-resolution representations by connecting high-to-low resolution convolutions in parallel and strengthens high-resolution representations by repeatedly performing multi-scale fusions across parallel convolutions. We demonstrate the effectives on pixel-level classification, region-level classification, and image-level classification.

The HRNet turns out to be a strong repalcement of classification networks (e.g., ResNets, VGGNets) for visual recognition. We believe that the HRNet will become the new standard backbone.

Semantic Segmentation

Object Detection

RCNN

YOLO

only look once (YOLO) is a state-of-the-art, real-time object detection system.


Object Tracking

Pose Estimation

Image Caption

Scene Understanding

Optical Character Recognition

Image Search

Style Transfer

Visualization /Interpretation of CNN

It has shown that

ImageNet trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.


graphcore.ai

Computer Graphics


Deep Dream
Deep Dream