Spaces:
Paused
Paused
RELIABILITY AND ROBUSTNESS | |
31 | |
For a fixed model m, the analytically intractable Bayesian posterior distribution | |
of the parameters θ is given by Bayes’ rule: | |
P (D | θ) | |
P (θ | D) = | |
P (D | θ)P (θ | m) | |
P (D | m) | |
P (θ) | |
P (θ | D) | |
likelihood of θ (in model m) | |
prior probability of θ | |
(2.18) | |
posterior of θ given data D | |
The denominator P (D|m) is intractable, since it requires integrating over all | |
possible parameter values weighted by their probabilities. This is known as | |
the inference problem, which is the main challenge in BDL, as the posterior | |
distribution is required to compute the predictive distribution for any new input | |
(Equation (3.1) further explains this). | |
In practice, BNNs are often implemented as Variational Inference (VI) | |
methods, which approximate the high-dimensional posterior distribution with a | |
tractable distribution family, such as a Gaussian distribution [46]. Let p(θ | D) | |
be the intractable posterior distribution of parameters θ given observed data D, | |
which will be approximated with a simpler, conjugate distribution q(θ|D; φ), | |
parameterized by φ (e.g., mean and variance). | |
The key idea consists of finding the optimal variational parameters φ∗ that | |
minimize the Kullback–Leibler (KL) divergence between the approximating | |
distribution q(θ|D; φ) and the replaced true posterior p(θ | D). This is achieved | |
by maximizing the evidence lower bound (ELBO), given by: | |
ELBO(φ) = Eq(θ|D;φ) [log p(D|θ)] − KL[q(θ|D; φ)||p(θ)] | |
Z | |
(2.19) | |
p(D|θ)p(θ) | |
dθ | |
(2.20) | |
q(θ|D; φ) | |
Z | |
Z | |
q(θ|D; φ) | |
= q(θ|D; φ) log p(D|θ)dθ − q(θ|D; φ) log | |
dθ, (2.21) | |
p(θ) | |
= | |
q(θ|D; φ) log | |
where the first term Equation (2.21) represents the expected likelihood of the | |
data given the parameters, and the second term quantifies the dissimilarity | |
between the variational distribution and the prior distribution over the | |
parameters. Maximizing the ELBO with φ is equivalent to minimizing the KL | |
divergence between q(θ|D; φ) and p(θ|D), thereby providing a lower bound on the | |
log marginal likelihood log p(D) ≥ ELBO(φ), after the parameters θ have been | |
integrated out. By optimizing the variational parameters φ, we simultaneously | |