ask_my_thesis / assets /txts /pg_0049.txt
jordyvl's picture
First commit
e0a78f5
raw
history blame
1.95 kB
STATISTICAL LEARNING
2.1.3.1
17
Convolutional Neural Networks
Convolutional Neural Networks (CNNs) [244] are a class of DNNs designed
primarily for visual and grid-spatial data such as images. They are inspired by
the visual cortex of animals, which contains neurons that are sensitive to small
subregions of the visual field, called a receptive field. The receptive fields of
different neurons partially overlap such that they cover the entire visual field,
growing larger in deeper layers of the visual cortex.
Figure 2.2. Sketch of a CNN architecture. The input is a 2D image, which is iteratively
convolved with a set of learned filters detecting specific input features, e.g., edges,
corners, blobs, to produce feature maps. Feature maps are then downsampled using
a pooling operation.
As illustrated in Figure 2.2, CNNs are composed of multiple convolutional layers,
which hierarchically extract features from the input, followed by pooling and
fully-connected layers to classify the input based on the downsampled features.
A filter K ∈ Rd×d is a rectangular matrix of trainable weights with width and
height d typically smaller than the input x. A convolutional layer applies filters
sliding over the input, with each filter producing a feature map:
F = K ∗ x,
(2.6)
where the convolution operation ∗ computes a dot product between filter entries
and the covered portions of the input.
Thanks to the weight sharing property of the convolution operation, CNNs are
able to learn translation invariance, i.e., the ability to recognize an object
regardless of its position in the image. This is particularly useful for object
detection, where the position of the object in the image is unknown.
This architecture was used for document image classification and document
layout analysis (Section 6.3.2). A special version is 1-D CNNs, which we applied
to one-hot encoded text data in text classification benchmarking (Section 3.4.3).