Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0049.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

1.95 kB

	STATISTICAL LEARNING

	2.1.3.1

	17

	Convolutional Neural Networks

	Convolutional Neural Networks (CNNs) [244] are a class of DNNs designed
	primarily for visual and grid-spatial data such as images. They are inspired by
	the visual cortex of animals, which contains neurons that are sensitive to small
	subregions of the visual field, called a receptive field. The receptive fields of
	different neurons partially overlap such that they cover the entire visual field,
	growing larger in deeper layers of the visual cortex.

	Figure 2.2. Sketch of a CNN architecture. The input is a 2D image, which is iteratively
	convolved with a set of learned filters detecting specific input features, e.g., edges,
	corners, blobs, to produce feature maps. Feature maps are then downsampled using
	a pooling operation.

	As illustrated in Figure 2.2, CNNs are composed of multiple convolutional layers,
	which hierarchically extract features from the input, followed by pooling and
	fully-connected layers to classify the input based on the downsampled features.
	A filter K ∈ Rd×d is a rectangular matrix of trainable weights with width and
	height d typically smaller than the input x. A convolutional layer applies filters
	sliding over the input, with each filter producing a feature map:
	F = K ∗ x,

	(2.6)

	where the convolution operation ∗ computes a dot product between filter entries
	and the covered portions of the input.
	Thanks to the weight sharing property of the convolution operation, CNNs are
	able to learn translation invariance, i.e., the ability to recognize an object
	regardless of its position in the image. This is particularly useful for object
	detection, where the position of the object in the image is unknown.
	This architecture was used for document image classification and document
	layout analysis (Section 6.3.2). A special version is 1-D CNNs, which we applied
	to one-hot encoded text data in text classification benchmarking (Section 3.4.3).