LawrenceC comments on Multi-Component Learning and S-Curves

LawrenceC 1 Dec 2022 20:38 UTC
LW: 1 AF: 1
0
AF
What happens in a cross-entropy loss style setup, rather than MSE loss? IMO cross-entropy loss is a better analogue to real networks. Though I’m confused about the right way to model an internal sub-circuit of the model. I think the exponential decay term just isn’t there?
There’s lots of ways to do this, but the obvious way is to flatten C and Z and treat them as logits.
- Adam Jermyn 1 Dec 2022 20:56 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Something like this?
  def loss(learned, target):
  p_target = torch.exp(target)
  p_target = p_target / torch.sum(p_target)
  
  p_learned = torch.exp(learned)
  p_learned = p_learned / torch.sum(p_learned)
  
  return -torch.sum(p_target * torch.log(p_learned))
  - LawrenceC 1 Dec 2022 20:57 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Well, I’d keep everything in log space and do the whole thing with log_sum_exp for numerical stability, but yeah.
    
    EDIT: e.g. something like:
    import torch.nn.functional as F
    def cross_entropy_loss(Z, C):
    return -torch.sum(F.log_softmax(Z) * C)
    - Adam Jermyn 1 Dec 2022 22:20 UTC
      LW: 1 AF: 1
      0
      AF Parent
      Erm do C and Z have to be valid normalized probabilities for this to work?
      - LawrenceC 2 Dec 2022 7:17 UTC
        LW: 1 AF: 1
        0
        AF Parent
        C needs to be probabilities, yeah. Z can be any vector of numbers. (You can convert C into probabilities with softmax)
        Adam Jermyn 2 Dec 2022 21:08 UTC
        LW: 1 AF: 1
        0
        AF Parent
        So indeed with cross-entropy loss I see two plateaus! Here’s rank 2:
        (note that I’ve offset the loss to so that equality of Z and C is zero loss)
        I have trouble getting rank 10 to find the zero-loss solution:
        But the phenomenology at full rank is unchanged: