RELIABILITY AND ROBUSTNESS 31 For a fixed model m, the analytically intractable Bayesian posterior distribution of the parameters θ is given by Bayes’ rule: P (D | θ) P (θ | D) = P (D | θ)P (θ | m) P (D | m) P (θ) P (θ | D) likelihood of θ (in model m) prior probability of θ (2.18) posterior of θ given data D The denominator P (D|m) is intractable, since it requires integrating over all possible parameter values weighted by their probabilities. This is known as the inference problem, which is the main challenge in BDL, as the posterior distribution is required to compute the predictive distribution for any new input (Equation (3.1) further explains this). In practice, BNNs are often implemented as Variational Inference (VI) methods, which approximate the high-dimensional posterior distribution with a tractable distribution family, such as a Gaussian distribution [46]. Let p(θ | D) be the intractable posterior distribution of parameters θ given observed data D, which will be approximated with a simpler, conjugate distribution q(θ|D; φ), parameterized by φ (e.g., mean and variance). The key idea consists of finding the optimal variational parameters φ∗ that minimize the Kullback–Leibler (KL) divergence between the approximating distribution q(θ|D; φ) and the replaced true posterior p(θ | D). This is achieved by maximizing the evidence lower bound (ELBO), given by: ELBO(φ) = Eq(θ|D;φ) [log p(D|θ)] − KL[q(θ|D; φ)||p(θ)] Z (2.19) p(D|θ)p(θ) dθ (2.20) q(θ|D; φ) Z Z q(θ|D; φ) = q(θ|D; φ) log p(D|θ)dθ − q(θ|D; φ) log dθ, (2.21) p(θ) = q(θ|D; φ) log where the first term Equation (2.21) represents the expected likelihood of the data given the parameters, and the second term quantifies the dissimilarity between the variational distribution and the prior distribution over the parameters. Maximizing the ELBO with φ is equivalent to minimizing the KL divergence between q(θ|D; φ) and p(θ|D), thereby providing a lower bound on the log marginal likelihood log p(D) ≥ ELBO(φ), after the parameters θ have been integrated out. By optimizing the variational parameters φ, we simultaneously