I’m sort of confused by the main point of that post. Is the idea that the robot can’t stack blocks because of a physical limitation? If so, it seems like this is addressed by the first initial objection. Is it rather that the model space might not have the capacity to correctly imitate the human? I’d be somewhat surprised by this being a big issue, and at any rate it seems like you could use the Wasserstein metric as a cost function and get a desirable outcome. I guess instead we’re instead imagining a problem where there’s no great metric (e.g. text answers to questions)?
Not all of these are compatible with “and so the robot does the thing that the human does 5% of the time”. But it seems like there can and probably will be factors that are different between the human and the robot (even if the human uses teleoperation), and in that setting imitating factored cognition provides the wrong incentives, while optimizing factored evaluation provides the right incentives.
I think AI will probably be good enough to pose a catastrophic risk before it can exactly imitate a human. (But as Wei Dai says elsewhere, if you do amplification then you will definitely get into the regime where you can’t imitate.)
Is it rather that the model space might not have the capacity to correctly imitate the human?
Paul wrote in a parallel subthread, “Against mimicry is mostly motivated by the case of imitating an amplified agent.” In the case of IDA, you’re bound to run out of model capacity eventually as you keep iterating the amplification and distillation.
I guess instead we’re instead imagining a problem where there’s no great metric (e.g. text answers to questions)?
I’m sort of confused by the main point of that post. Is the idea that the robot can’t stack blocks because of a physical limitation? If so, it seems like this is addressed by the first initial objection. Is it rather that the model space might not have the capacity to correctly imitate the human? I’d be somewhat surprised by this being a big issue, and at any rate it seems like you could use the Wasserstein metric as a cost function and get a desirable outcome. I guess instead we’re instead imagining a problem where there’s no great metric (e.g. text answers to questions)?
There are lots of reasons that a robot might be unable to learn the correct policy despite the action space permitting it:
Not enough model capacity
Not enough training data
Training got stuck in a local optimum
You’ve learned from robot play data, but you’ve never seen anything like the human policy before
etc, etc.
Not all of these are compatible with “and so the robot does the thing that the human does 5% of the time”. But it seems like there can and probably will be factors that are different between the human and the robot (even if the human uses teleoperation), and in that setting imitating factored cognition provides the wrong incentives, while optimizing factored evaluation provides the right incentives.
I think AI will probably be good enough to pose a catastrophic risk before it can exactly imitate a human. (But as Wei Dai says elsewhere, if you do amplification then you will definitely get into the regime where you can’t imitate.)
Paul wrote in a parallel subthread, “Against mimicry is mostly motivated by the case of imitating an amplified agent.” In the case of IDA, you’re bound to run out of model capacity eventually as you keep iterating the amplification and distillation.
I’m pretty sure that’s it.