Additionally, “loss function” is often used to also refer to the supervised labels in a dataset. EG I don’t imagine proponents of “find an aligned loss function” to be imagining moving away from ℓ2 loss and towards KL. They’re thinking about dataset labels y for each point x, and then given a prediction xp and a loss function ℓ, we can provide a loss signal which maps datapoints and predictions to loss: L:=(x,xp)↦ℓ(y,xp)∈R.
Additionally, “loss function” is often used to also refer to the supervised labels in a dataset. EG I don’t imagine proponents of “find an aligned loss function” to be imagining moving away from ℓ2 loss and towards KL. They’re thinking about dataset labels y for each point x, and then given a prediction xp and a loss function ℓ, we can provide a loss signal which maps datapoints and predictions to loss: L:=(x,xp)↦ℓ(y,xp)∈R.