If our more aligned model needs to spend T% more inference-time compute to get from performance Z’ back to performance Z on capability X, then we say there is a T% alignment tax. For example, if we always need to run best-of-2, this corresponds to a 100% alignment tax. If we need to run best-of-4 for 10% of all tasks, this corresponds to a 4*10% = 40% alignment tax.
Something seems wrong here. The examples are:
2x as much compute on 100% of tasks --> 100% alignment tax
4x as much compute on 10% of tasks (and 1x as much compute on 90% of tasks) --> 40% alignment tax
In (1), we spend 2*100% = 200% of the compute we spent before, which is 100% more. But in (2), we spend (4*10% + 1*90%) = 130% of the compute we spent before, which is 30% more. So I think the 2nd example is a 30% alignment tax, not 40%?
Something seems wrong here. The examples are:
2x as much compute on 100% of tasks --> 100% alignment tax
4x as much compute on 10% of tasks (and 1x as much compute on 90% of tasks) --> 40% alignment tax
In (1), we spend 2*100% = 200% of the compute we spent before, which is 100% more. But in (2), we spend (4*10% + 1*90%) = 130% of the compute we spent before, which is 30% more. So I think the 2nd example is a 30% alignment tax, not 40%?