The trouble is that if the AI finds a solution we didn’t expect, it’s not sufficient for the resources to be limited—we can still get into trouble if the goal is open-ended. For example, if there is no upper bound on how good a chess program can be, then the AI wants to somehow control lots of resources to improve the chess program. It is running a search process for ways around the resource limitations (like building a successor and running it on another computer, or convincing you to change your mind, or exploiting a bug in the code), and we’re just hoping that search fails.
The real trick, in other words, is not limiting the resources of the AI, it’s making the AI’s goals only need limited resources to be fulfilled.
The trouble is that if the AI finds a solution we didn’t expect, it’s not sufficient for the resources to be limited—we can still get into trouble if the goal is open-ended. For example, if there is no upper bound on how good a chess program can be, then the AI wants to somehow control lots of resources to improve the chess program. It is running a search process for ways around the resource limitations (like building a successor and running it on another computer, or convincing you to change your mind, or exploiting a bug in the code), and we’re just hoping that search fails.
The real trick, in other words, is not limiting the resources of the AI, it’s making the AI’s goals only need limited resources to be fulfilled.
Other people have written some relevant blog posts about this, so I’ll provide links:
Reduced impact AI: no back channels
Summoning the Least Powerful Genie