One explanation for pathological errors is feature suppression/feature shrinkage (link). I’d be interested to see if errors are still pathological even if you use the methodology I proposed for finetuning to fix shrinkage. Your method of fixing the norm of the input is close but not quite the same.
One explanation for pathological errors is feature suppression/feature shrinkage (link). I’d be interested to see if errors are still pathological even if you use the methodology I proposed for finetuning to fix shrinkage. Your method of fixing the norm of the input is close but not quite the same.