The first RHS term is the
KL divergence
of the approximate from the true posterior.
Since this KL-divergence is non-negative, the second RHS term
\(\mathcal{L}(\theta, \phi; \mathbf{x}^{(i)})\) is called the (variational)
lower boun(Evidence Lower Bound, ELBO)
on the marginal likelihood of datapoint \(i\), can be written as:
\[
\,\\
\log p_{\theta}(\mathcal{x}^{(i)}) \ge \mathcal{L}(\theta, \phi; \mathcal{x}^{(i)})
= \mathbb{E}_{q \phi (\mathbf{z}|\mathbf{x})} [-\log q_{\phi}(\mathbf{z}|\mathbf{x}) + \log p_{\theta}(\mathbf{x}, \mathbf{z})]
\,\\
\]
where the joint probability decomposition \( p_{\theta}(\mathbf{x}, \mathbf{z}) = p_{\theta}(\mathbf{x}|\mathbf{z})p_{\theta}(\mathbf{z}) \) is applied:
\[
\,\\
\mathcal{L}(\theta, \phi; \mathbf{x}^{(i)}) = \mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}
\big[\log p_{\theta}(\mathbf{x}|\mathbf{z}) + \log p_{\theta}(\mathbf{z}) - \log q_{\phi}(\mathbf{z}|\mathbf{x})\big]
\,\\
\]
To include the KL divergence, we use its definition:
\[
\,\\
D_{KL}(q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\theta}(\mathbf{z})) =
\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})} \big[\log q_{\phi}(\mathbf{z}|\mathbf{x}) - \log p_{\theta}(\mathbf{z})\big].
\,\\
\]
Rearranging this definition gives:
\[
\,\\
\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})} \big[\log p_{\theta}(\mathbf{z})\big] -
\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})} \big[\log q_{\phi}(\mathbf{z}|\mathbf{x})\big] =
-D_{KL}(q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\theta}(\mathbf{z})).
\,\\
\]
Substituting this into the ELBO, we have:
\[
\,\\
\mathcal{L}(\theta, \phi; \mathbf{x}^{(i)}) = - D_{KL}(q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\theta}(\mathbf{z}))
+ \mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})} \big[\log p_{\theta}(\mathbf{x}|\mathbf{z})\big]
\,\\
\]
The first term, the KL divergence term, regularizes the approximate posterior \(q_{\phi}(\mathbf{z}|\mathbf{x})\) to be close to the prior \(p_{\theta}(\mathbf{z})\).
The second term represents the reconstruction term, which measures how well the data can be reconstructed given the latent variables.