A more generalized notion of IDA is “using AIs as part of your process for training AIs to make the training signal stronger”.
In the case of IDA, this looks like using AIs in the recursive step to (hopefully) make the imitative policy more powerful.
In the case of recursive reward modeling, this looks like using AIs to compute (hopefully) more accurate rewards.
Same for debate.
Using AIs recursively to get better imitation data, a better reward signal, or otherwise help with training isn’t dead. For instance, see constitutional AI or critic GPT.
A more generalized notion of IDA is “using AIs as part of your process for training AIs to make the training signal stronger”.
In the case of IDA, this looks like using AIs in the recursive step to (hopefully) make the imitative policy more powerful.
In the case of recursive reward modeling, this looks like using AIs to compute (hopefully) more accurate rewards.
Same for debate.
Using AIs recursively to get better imitation data, a better reward signal, or otherwise help with training isn’t dead. For instance, see constitutional AI or critic GPT.
It’s important to note that using AIs as part of your process for training AIs isn’t a notable invention; this would be used by default to at least some extent.