ask_my_thesis / assets /txts /pg_0045.txt
jordyvl's picture
First commit
e0a78f5
raw
history blame
2.03 kB
STATISTICAL LEARNING
13
possible functions. The objective is to find a function f ∈ F that minimizes the
risk, or even better, the Bayes risk
f ∗ = inf R(f ),
f ∈F
(2.2)
which is the minimum achievable risk over all functions in F. The latter is only
realizable with infinite data or having access to the data-generating distribution
P(X , Y). In practice, Equation (2.2) is unknown, and the goal is to find a
function fˆ that minimizes the empirical risk
N
1 X
`(yi , f (xi )),
fˆ =
N i=1
(2.3)
where (xi , yi ) are N independently and identically distributed (i.i.d.) samples
drawn from an unknown distribution P on X × Y. This is known as empirical
risk minimization (ERM), which is a popular approach to supervised learning,
under which three important processes are defined.
Training or model fitting is the process of estimating the parameters θ of a
model, which is done by minimizing a suitable loss function ` over a training
set D = {(xi , yi )}N
i=1 of N i.i.d. samples.
Inference or prediction is the process of estimating the output of a model for
a given input, which is typically done by computing the posterior probability
P (y|x) over the output space Y. Classification output is a discrete label, while
regression output is a continuous value.
Evaluation involves measuring the quality of a model’s predictions, which is
typically done by computing a suitable evaluation metric over a test set Dtest
of i.i.d. samples, which were not used for training.
However, ERM has its caveats concerning generalization to unseen data,
requiring either additional assumptions on the hypothesis class F, which
are known as inductive biases, and/or regularization to penalize the
complexity of the function class F [445]. In neural networks (discussed in
detail Section 2.1.1), the former is controlled by the architecture of the network,
while the latter involves specifying constraints to parameters or adding a
regularization term to the loss function.


fˆ = arg min R̂(f ) + λΨ(θ)
f ∈F
(2.4)