I think this is a good idea, though it’s not new. I have written about this at some length (jessica linked to a few examples, but much of the content here is relevant), and it’s what people usually are trying to do in apprenticeship learning. I agree there is probably no realistic scenario where you would use the reduced impact machinery instead of doing this the way you describe (i.e. the way people already do it).
Having the AI try to solve the problem (rather than simply trying to mimic the human) doesn’t really buy you that much, and has big costs. If the human can’t solve the problem with non-negligible probability, then you simply aren’t going to get a good result using this technique. And if the human can solve the problem, then you can just train on instances where the human successfully solves it. You don’t save anything computationally with the conditioning.
Bootstrapping seems like the most natural way to improve performance to superhuman levels. I expect bootstrapping to work fine, if you could get the basic protocol off the ground.
The connection to adversarial networks is not really a “parallel.” They are literally the same thing (modulo your extra requirement that the system do the task, which is equivalent to Jessica’s quantilization proposal but which I think should definitely be replaced with bootstrapping).
I think the most important problem is that AI systems do tasks in inhuman ways, such that imitating a human entails a significant disadvantage. Put a different way, it may be harder to train an AI to imitate a human than to simply do the task. So I think the main question is how to get over that problem. I think this is the baseline to start from, but it probably won’t work in general.
Overall I feel more optimistic about approval-direction than imitation for this reason. But approval-direction has its own (extremely diluted) versions of the usual safety concerns, and imitation is pretty great since it literally avoids them altogether. So if it could be fixed that would be great.
This post covers the basic idea of collecting training data with low probability online. This post describes why it might result in very low overhead for aligned AI systems.
I think this is a good idea, though it’s not new. I have written about this at some length (jessica linked to a few examples, but much of the content here is relevant), and it’s what people usually are trying to do in apprenticeship learning. I agree there is probably no realistic scenario where you would use the reduced impact machinery instead of doing this the way you describe (i.e. the way people already do it).
Having the AI try to solve the problem (rather than simply trying to mimic the human) doesn’t really buy you that much, and has big costs. If the human can’t solve the problem with non-negligible probability, then you simply aren’t going to get a good result using this technique. And if the human can solve the problem, then you can just train on instances where the human successfully solves it. You don’t save anything computationally with the conditioning.
Bootstrapping seems like the most natural way to improve performance to superhuman levels. I expect bootstrapping to work fine, if you could get the basic protocol off the ground.
The connection to adversarial networks is not really a “parallel.” They are literally the same thing (modulo your extra requirement that the system do the task, which is equivalent to Jessica’s quantilization proposal but which I think should definitely be replaced with bootstrapping).
I think the most important problem is that AI systems do tasks in inhuman ways, such that imitating a human entails a significant disadvantage. Put a different way, it may be harder to train an AI to imitate a human than to simply do the task. So I think the main question is how to get over that problem. I think this is the baseline to start from, but it probably won’t work in general.
Overall I feel more optimistic about approval-direction than imitation for this reason. But approval-direction has its own (extremely diluted) versions of the usual safety concerns, and imitation is pretty great since it literally avoids them altogether. So if it could be fixed that would be great.
This post covers the basic idea of collecting training data with low probability online. This post describes why it might result in very low overhead for aligned AI systems.