Super cool stuff. Minor question, what does “Fraction of MLP progress” mean? Are you scaling down the MLP output values that get added to the residual stream? Thanks!
Super cool stuff. Minor question, what does “Fraction of MLP progress” mean? Are you scaling down the MLP output values that get added to the residual stream? Thanks!