The BlueDot Impact write-up for scalable oversight seems to suggest that people have given up on iterated distillation and amplification (IDA) working. I haven’t really seen much research here, but is that actually the case?
I know that Ought is now pretty much inactive because their attempts at factored cognition failed and this has made many people pessimistic about the factored cognition hypothesis. However, Ought seemed to be really pushing the angle that we could break things into sub-problems each of which could be completed quickly, whilst I don’t think that’s a required part of IDA. Therefore it isn’t clear to me that the failure of factored cognition indicates the failure of IDA.
A more generalized notion of IDA is “using AIs as part of your process for training AIs to make the training signal stronger”.
In the case of IDA, this looks like using AIs in the recursive step to (hopefully) make the imitative policy more powerful.
In the case of recursive reward modeling, this looks like using AIs to compute (hopefully) more accurate rewards.
Same for debate.
Using AIs recursively to get better imitation data, a better reward signal, or otherwise help with training isn’t dead. For instance, see constitutional AI or critic GPT.
It’s important to note that using AIs as part of your process for training AIs isn’t a notable invention; this would be used by default to at least some extent.