Something like this?
def loss(learned, target): p_target = torch.exp(target) p_target = p_target / torch.sum(p_target) p_learned = torch.exp(learned) p_learned = p_learned / torch.sum(p_learned) return -torch.sum(p_target * torch.log(p_learned))
Well, I’d keep everything in log space and do the whole thing with log_sum_exp for numerical stability, but yeah. EDIT: e.g. something like:
import torch.nn.functional as Fdef cross_entropy_loss(Z, C): return -torch.sum(F.log_softmax(Z) * C)
import torch.nn.functional as F
def cross_entropy_loss(Z, C): return -torch.sum(F.log_softmax(Z) * C)
Erm do C and Z have to be valid normalized probabilities for this to work?
C needs to be probabilities, yeah. Z can be any vector of numbers. (You can convert C into probabilities with softmax)
So indeed with cross-entropy loss I see two plateaus! Here’s rank 2:
(note that I’ve offset the loss to so that equality of Z and C is zero loss)
I have trouble getting rank 10 to find the zero-loss solution:
But the phenomenology at full rank is unchanged:
Something like this?
Well, I’d keep everything in log space and do the whole thing with log_sum_exp for numerical stability, but yeah.
EDIT: e.g. something like:
Erm do C and Z have to be valid normalized probabilities for this to work?
C needs to be probabilities, yeah. Z can be any vector of numbers. (You can convert C into probabilities with softmax)
So indeed with cross-entropy loss I see two plateaus! Here’s rank 2:
(note that I’ve offset the loss to so that equality of Z and C is zero loss)
I have trouble getting rank 10 to find the zero-loss solution:
But the phenomenology at full rank is unchanged: