Wasserstein Distance (Earth Mover's Distance) 02/12/2025 Intuitive understanding of Wasserstein Distance (Earth Mover's Distance) Unsupervised Learning is about directly learning the distribution \(\mathbf{P(x)}\) of data \(\mathbf{x}\), unlike supervised learning method that has training data \(\mathbf{x}\) with label \(\mathbf{y}\). This method measures the distance between two distributions. (there is a method called KL-Divergence) The Wasserstein distance (or the EM distance) is the cost of the cheapest transport plan. In the example below, both plans have different cost and the Wasserstein distance (minimum cost) is two. In the top row, there is a distribution on positions \(3\), \(4\) (each with mass \(1\)) and positions \(6\), \(7\) (each with mass \(2\)). In the bottom row, the masses on positions \(3\), \(4\), \(6\), \(7\) are rearranged \((2, 1)\) at \(3\) and \(4\), and \((2, 1)\) at \(6\) and \(7\). Therefore, in the discrete distribution, the Wasserstein distance is the sum of the absolute differences of the positions of the masses. \[ \,\\ W = \sum_i^N |\delta_i| \,\\ \] where the \(\delta_{i+1} = \delta_i + P_{i} - Q_{i} \). The \(N\) is the number of the piles in the discrete distribution. Instead of having discrete piles of mass at specific points, you now have continuous masses (dirt). \[ \,\\ W(\mathbf{P}_r, \mathbf{P}_g) = \inf_{\gamma \in \Pi(\mathbf{P}_r, \mathbf{P}_g)} \mathbb{E}_{(x,y) \sim \gamma} [\|x-y\|] \,\\ \] where \(\mathbf{P}_r\) and \(\mathbf{P}_g\) are two probability distributions, \(\Pi(\mathbf{P}_r, \mathbf{P}_g)\) is the set of all joint probability distributions of \(\mathbf{P}_r\) and \(\mathbf{P}_g\), \(\| x - y \|\) is the distance to move from \(x \sim \mathbf{P}_r\) to \(y \sim \mathbf{P}_g\). \(\inf\) means finding the minimum value of the set (lower bound). References https://velog.io/@kite_day/Wasserstain-Distance-Earth-Movers-distance https://jonathan-hui.medium.com/gan-wasserstein-gan-wgan-gp https://ahjeong.tistory.com/7