The addition of the distillation step is an extra confounder, but we hope that it doesn’t distort anything too much—its purpose is to improve speed without affecting anything else (though in practice it will reduce capabilities somewhat).
I think this is the crux of my confusion, so I would appreciate if you could elaborate on this. (Everything else in your answer makes sense to me.) In Evans et al., during the distillation step, the model M learns to solve the difficult tasks directly by using example solutions from the amplification step. But if M can do that, then why can’t it also learn directly from examples provided by the human?
To use your analogy, I have no doubt that a team of 213 Rohins or a single Rohin thinking for 213 days can answer any question that I can (given a single day). But with distillation you’re saying there’s a robot that can learn to answer any question I can (given a single day) by first observing the team of 213 Rohins for long enough. If the robot can do that, why can’t the robot also learn to do the same thing by observing me for long enough?
why can’t the robot also learn to do the same thing by observing me for long enough?
You could do this, but it’s expensive. In practice, from the perspective of distillation, there’s always a tradeoff between:
Generating better ground truth data (which you can do by amplifying the agent that generates the ground truth data)
Improving the accuracy of the distilled model (which you can do by increasing the amount of data that you train on, and other ML tricks)
You could get to an Issa-level model using just the second method for long enough, but it’s going to be much more efficient to get to an Issa-level model by alternating the two methods.
I’m confused about the tradeoff you’re describing. Why is the first bullet point “Generating better ground truth data”? It would make more sense to me if it said instead something like “Generating large amounts of non-ground-truth data”. In other words, the thing that amplification seems to be providing is access to more data (even if that data isn’t the ground truth that is provided by the original human).
Also in the second bullet point, by “increasing the amount of data that you train on” I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm.
Aside from that, I think my main confusion now is pedagogical (rather than technical). I don’t understand why the IDA post and paper don’t emphasize the efficiency of training. The post even says “Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects” which makes it sound like the efficiency of training isn’t important.
Why is the first bullet point “Generating better ground truth data”? It would make more sense to me if it said instead something like “Generating large amounts of non-ground-truth data”.
By “ground truth” I just mean “the data that the agent is trained on”, feel free to just ignore that part of the phrase.
But it is important that it is better data. The point of amplification is that Amplify(M) is more competent than M, e.g. it is a better speech writer, it has a higher ELO rating for chess, etc. This is because Amplify(M) is supposed to approximate “M thinking for a longer time”.
Also in the second bullet point, by “increasing the amount of data that you train on” I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm.
Yes, that’s right.
Aside from that, I think my main confusion now is pedagogical (rather than technical). I don’t understand why the IDA post and paper don’t emphasize the efficiency of training.
“Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects”
I suspect Paul would say that it is plausibly competitive relative to training a system using RL with a fixed reward function (because the additional human-in-the-loop effort could be a small fraction of that, as long as we do semi-supervised RL well).
However, maybe we train systems in some completely different way (e.g. GPT-2 style language models), it’s very hard to predict right now how IDA would compare to that.
I think this is the crux of my confusion, so I would appreciate if you could elaborate on this. (Everything else in your answer makes sense to me.) In Evans et al., during the distillation step, the model M learns to solve the difficult tasks directly by using example solutions from the amplification step. But if M can do that, then why can’t it also learn directly from examples provided by the human?
To use your analogy, I have no doubt that a team of 213 Rohins or a single Rohin thinking for 213 days can answer any question that I can (given a single day). But with distillation you’re saying there’s a robot that can learn to answer any question I can (given a single day) by first observing the team of 213 Rohins for long enough. If the robot can do that, why can’t the robot also learn to do the same thing by observing me for long enough?
You could do this, but it’s expensive. In practice, from the perspective of distillation, there’s always a tradeoff between:
Generating better ground truth data (which you can do by amplifying the agent that generates the ground truth data)
Improving the accuracy of the distilled model (which you can do by increasing the amount of data that you train on, and other ML tricks)
You could get to an Issa-level model using just the second method for long enough, but it’s going to be much more efficient to get to an Issa-level model by alternating the two methods.
I’m confused about the tradeoff you’re describing. Why is the first bullet point “Generating better ground truth data”? It would make more sense to me if it said instead something like “Generating large amounts of non-ground-truth data”. In other words, the thing that amplification seems to be providing is access to more data (even if that data isn’t the ground truth that is provided by the original human).
Also in the second bullet point, by “increasing the amount of data that you train on” I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm.
Aside from that, I think my main confusion now is pedagogical (rather than technical). I don’t understand why the IDA post and paper don’t emphasize the efficiency of training. The post even says “Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects” which makes it sound like the efficiency of training isn’t important.
By “ground truth” I just mean “the data that the agent is trained on”, feel free to just ignore that part of the phrase.
But it is important that it is better data. The point of amplification is that Amplify(M) is more competent than M, e.g. it is a better speech writer, it has a higher ELO rating for chess, etc. This is because Amplify(M) is supposed to approximate “M thinking for a longer time”.
Yes, that’s right.
Paul’s posts often do talk about this, e.g. An unaligned benchmark, and the competitiveness desideratum in Directions and desiderata for AI alignment. I agree though that it’s hard to realize this since the posts are quite scattered.
I suspect Paul would say that it is plausibly competitive relative to training a system using RL with a fixed reward function (because the additional human-in-the-loop effort could be a small fraction of that, as long as we do semi-supervised RL well).
However, maybe we train systems in some completely different way (e.g. GPT-2 style language models), it’s very hard to predict right now how IDA would compare to that.