Spaces:

jordyvl
/

ask_my_thesis

Paused

File size: 2,186 Bytes

e0a78f5

STATISTICAL LEARNING

15

Sigmoid Function
1
σ(z) =
1 + exp−z

Softmax Function
exp(z)
softmax(z) = PK
k=1 exp(zk )

Table 2.1. Sigmoid and softmax activation functions for binary and multi-class
classification, respectively.

ready continuous-spatial signal, different DL architectures have been established,
which will be discussed in Section 2.1.3.
A K-class classification function with an l-layer NN with d dimensional input x ∈
Rd is shorthand fθ : Rd → RK , with θ = {θj }lj=1 assumed to be optimized, either
partially or fully, using backpropagation and a loss function. More specifically,
it presents a non-convex optimization problem, concerning multiple feasible
regions with multiple locally optimal points within each. With maximumlikelihood estimation estimation, the goal is to find the optimal parameters
or weights that minimize the loss function, effectively interpolating the training
data. This process involves traversing the high-dimensional loss landscape.
Upon convergence of model training, the optimized parameters form a solution
in the weight-space, representing a unique mode (specific function fθ̂ ). However,
when regularization techniques such as weight decay, dropout, or early stopping
are applied, the objective shifts towards maximum-a-posteriori (MAP), to
take into account the prior probability of the parameters. The difference in
parameter estimation forms the basis for several uncertainty estimation methods,
covered in Section 2.2.5.
A prediction is a translation of a model’s output to which a standard decision
rule is applied, e.g., to obtain the top-1/k prediction (Equation (2.5)), or decode
structured output according to a function maximizing total likelihood with
optionally additional diversity criteria.
ŷ = argmax fθ̂ (x)

(2.5)

Considering standard NNs, the last layer outputs a vector of real-valued logits
z ∈ RK , which in turn are normalized to a probability distribution over K
classes using a sigmoid or softmax function (Table 2.1).

2.1.2

Probabilistic Evaluation

The majority of our works involves supervised learning with NNs, formulated
generically as a probabilistic predictor in Definition 1.