Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0047.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.19 kB

	STATISTICAL LEARNING

	15

	Sigmoid Function
	1
	σ(z) =
	1 + exp−z

	Softmax Function
	exp(z)
	softmax(z) = PK
	k=1 exp(zk )

	Table 2.1. Sigmoid and softmax activation functions for binary and multi-class
	classification, respectively.

	ready continuous-spatial signal, different DL architectures have been established,
	which will be discussed in Section 2.1.3.
	A K-class classification function with an l-layer NN with d dimensional input x ∈
	Rd is shorthand fθ : Rd → RK , with θ = {θj }lj=1 assumed to be optimized, either
	partially or fully, using backpropagation and a loss function. More specifically,
	it presents a non-convex optimization problem, concerning multiple feasible
	regions with multiple locally optimal points within each. With maximumlikelihood estimation estimation, the goal is to find the optimal parameters
	or weights that minimize the loss function, effectively interpolating the training
	data. This process involves traversing the high-dimensional loss landscape.
	Upon convergence of model training, the optimized parameters form a solution
	in the weight-space, representing a unique mode (specific function fθ̂ ). However,
	when regularization techniques such as weight decay, dropout, or early stopping
	are applied, the objective shifts towards maximum-a-posteriori (MAP), to
	take into account the prior probability of the parameters. The difference in
	parameter estimation forms the basis for several uncertainty estimation methods,
	covered in Section 2.2.5.
	A prediction is a translation of a model’s output to which a standard decision
	rule is applied, e.g., to obtain the top-1/k prediction (Equation (2.5)), or decode
	structured output according to a function maximizing total likelihood with
	optionally additional diversity criteria.
	ŷ = argmax fθ̂ (x)

	(2.5)

	Considering standard NNs, the last layer outputs a vector of real-valued logits
	z ∈ RK , which in turn are normalized to a probability distribution over K
	classes using a sigmoid or softmax function (Table 2.1).

	2.1.2

	Probabilistic Evaluation

	The majority of our works involves supervised learning with NNs, formulated
	generically as a probabilistic predictor in Definition 1.