This intuition—that the KL is a metric-squared—is indeed important for understanding the KL divergence. It’s a property that all divergences have in common. Divergences can be thought of as generalizations of the Euclidean metric where you replace the quadratic—which is in some sense the Platonic convex function—with a convex function of your choice.
This intuition is also important for understanding Talagrand’s T2 inequality which says that, under certain conditions like strong log-concavity of the reference measure q, the Wasserstein-2 distance (which is analogous to the Euclidean metric-squared only lifted as a metric on the space of probability measures) between the two probability measures p and q can be upperbounded by their KL divergence.
This intuition—that the KL is a metric-squared—is indeed important for understanding the KL divergence. It’s a property that all divergences have in common. Divergences can be thought of as generalizations of the Euclidean metric where you replace the quadratic—which is in some sense the Platonic convex function—with a convex function of your choice.
This intuition is also important for understanding Talagrand’s T2 inequality which says that, under certain conditions like strong log-concavity of the reference measure q, the Wasserstein-2 distance (which is analogous to the Euclidean metric-squared only lifted as a metric on the space of probability measures) between the two probability measures p and q can be upperbounded by their KL divergence.