While it has been argued that the simplest program that solves a complex task is likely to be deceptive, it hasn’t yet been argued whether the fastest program that solves a complex task will be deceptive. This post argues that fast programs will often be forced to learn a good policy (just as we need to do today), and the learned policy is likely to be deceptive (presumably due to risks from learned optimization). Thus, there are at least some tasks where the fastest program will also be deceptive.
Planned opinion:
This is an intriguing hypothesis, but I’m not yet convinced: it’s not clear why the fastest program would have to learn the best policy, rather than directly hardcoding the best policy. If there are multiple possible tasks, the program could have a nested if structure that figures out which task needs to be done and then executes the best policy for that task. More details in this comment.
Planned summary:
While it has been argued that the simplest program that solves a complex task is likely to be deceptive, it hasn’t yet been argued whether the fastest program that solves a complex task will be deceptive. This post argues that fast programs will often be forced to learn a good policy (just as we need to do today), and the learned policy is likely to be deceptive (presumably due to risks from learned optimization). Thus, there are at least some tasks where the fastest program will also be deceptive.
Planned opinion:
This is an intriguing hypothesis, but I’m not yet convinced: it’s not clear why the fastest program would have to learn the best policy, rather than directly hardcoding the best policy. If there are multiple possible tasks, the program could have a nested if structure that figures out which task needs to be done and then executes the best policy for that task. More details in this comment.