Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0063.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.18 kB

	RELIABILITY AND ROBUSTNESS

	31

	For a fixed model m, the analytically intractable Bayesian posterior distribution
	of the parameters θ is given by Bayes’ rule:
	P (D \| θ)
	P (θ \| D) =

	P (D \| θ)P (θ \| m)
	P (D \| m)

	P (θ)
	P (θ \| D)

	likelihood of θ (in model m)
	prior probability of θ

	(2.18)

	posterior of θ given data D

	The denominator P (D\|m) is intractable, since it requires integrating over all
	possible parameter values weighted by their probabilities. This is known as
	the inference problem, which is the main challenge in BDL, as the posterior
	distribution is required to compute the predictive distribution for any new input
	(Equation (3.1) further explains this).
	In practice, BNNs are often implemented as Variational Inference (VI)
	methods, which approximate the high-dimensional posterior distribution with a
	tractable distribution family, such as a Gaussian distribution [46]. Let p(θ \| D)
	be the intractable posterior distribution of parameters θ given observed data D,
	which will be approximated with a simpler, conjugate distribution q(θ\|D; φ),
	parameterized by φ (e.g., mean and variance).
	The key idea consists of finding the optimal variational parameters φ∗ that
	minimize the Kullback–Leibler (KL) divergence between the approximating
	distribution q(θ\|D; φ) and the replaced true posterior p(θ \| D). This is achieved
	by maximizing the evidence lower bound (ELBO), given by:

	ELBO(φ) = Eq(θ\|D;φ) [log p(D\|θ)] − KL[q(θ\|D; φ)\|\|p(θ)]
	Z

	(2.19)

	p(D\|θ)p(θ)
	dθ
	(2.20)
	q(θ\|D; φ)
	Z
	Z
	q(θ\|D; φ)
	= q(θ\|D; φ) log p(D\|θ)dθ − q(θ\|D; φ) log
	dθ, (2.21)
	p(θ)

	=

	q(θ\|D; φ) log

	where the first term Equation (2.21) represents the expected likelihood of the
	data given the parameters, and the second term quantifies the dissimilarity
	between the variational distribution and the prior distribution over the
	parameters. Maximizing the ELBO with φ is equivalent to minimizing the KL
	divergence between q(θ\|D; φ) and p(θ\|D), thereby providing a lower bound on the
	log marginal likelihood log p(D) ≥ ELBO(φ), after the parameters θ have been
	integrated out. By optimizing the variational parameters φ, we simultaneously