On one hand, maybe? Maybe training using a differential representation and SGD was the only missing ingredient.
But I think I’ll believe it when I see large neural models distilled into relatively tiny symbolic models with no big loss of function. If that’s hard, it means that partial activations and small differences in coefficients are doing important work.
On one hand, maybe? Maybe training using a differential representation and SGD was the only missing ingredient.
But I think I’ll believe it when I see large neural models distilled into relatively tiny symbolic models with no big loss of function. If that’s hard, it means that partial activations and small differences in coefficients are doing important work.