Do you want your IOI circuit to include the mechanism that decides it needs to output a name? Then use zero ablations. Or do you want to find the circuit that, given the context of outputting a name, completes the IOI task? Then use mean ablations. The ablation determines the task.
Mean ablation over webtext rather than the IOI task set should work just as well as zero ablation, right? “Mean ablation” is underspecified in the absence of a dataset distribution.
Mean ablation over webtext rather than the IOI task set should work just as well as zero ablation, right? “Mean ablation” is underspecified in the absence of a dataset distribution.
Yes that’s correct, this wording was imprecise.