So, it’s a linear scale, and they’re comparing the CE loss increase from inserting the SAE to the CE loss increase from just destroying the model and outputting a ≈ uniform distribution over tokens. The latter is a very large CE loss increase, so the denominator is really big. Thus, scoring over 90% is pretty easy.
The metric you mention here is probably ‘loss recovered’. For a residual stream insertion, it goes
1-(CE loss with SAE- CE loss of original model)/(CE loss if the entire residual stream is ablated-CE loss of original model)
See e.g. equation 5 here.
So, it’s a linear scale, and they’re comparing the CE loss increase from inserting the SAE to the CE loss increase from just destroying the model and outputting a ≈ uniform distribution over tokens. The latter is a very large CE loss increase, so the denominator is really big. Thus, scoring over 90% is pretty easy.