Spaces:
Paused
Paused
STATISTICAL LEARNING | |
2.1.3.1 | |
17 | |
Convolutional Neural Networks | |
Convolutional Neural Networks (CNNs) [244] are a class of DNNs designed | |
primarily for visual and grid-spatial data such as images. They are inspired by | |
the visual cortex of animals, which contains neurons that are sensitive to small | |
subregions of the visual field, called a receptive field. The receptive fields of | |
different neurons partially overlap such that they cover the entire visual field, | |
growing larger in deeper layers of the visual cortex. | |
Figure 2.2. Sketch of a CNN architecture. The input is a 2D image, which is iteratively | |
convolved with a set of learned filters detecting specific input features, e.g., edges, | |
corners, blobs, to produce feature maps. Feature maps are then downsampled using | |
a pooling operation. | |
As illustrated in Figure 2.2, CNNs are composed of multiple convolutional layers, | |
which hierarchically extract features from the input, followed by pooling and | |
fully-connected layers to classify the input based on the downsampled features. | |
A filter K ∈ Rd×d is a rectangular matrix of trainable weights with width and | |
height d typically smaller than the input x. A convolutional layer applies filters | |
sliding over the input, with each filter producing a feature map: | |
F = K ∗ x, | |
(2.6) | |
where the convolution operation ∗ computes a dot product between filter entries | |
and the covered portions of the input. | |
Thanks to the weight sharing property of the convolution operation, CNNs are | |
able to learn translation invariance, i.e., the ability to recognize an object | |
regardless of its position in the image. This is particularly useful for object | |
detection, where the position of the object in the image is unknown. | |
This architecture was used for document image classification and document | |
layout analysis (Section 6.3.2). A special version is 1-D CNNs, which we applied | |
to one-hot encoded text data in text classification benchmarking (Section 3.4.3). | |