My understanding is that we are eschewing Problem 2, with one caveat—we still expect to solve the problem if the means by which the diamond was stolen or disappeared could be beyond a human’s ability to comprehend, as long as the outcome (that the diamond isn’t still in the room) is still comprehensible. For example, if the robber used some complicated novel technology to steal the diamond and hack the camera, there would be many things about the state that the human couldn’t understand even if the AI tried to explain it to them (at least without going over our compute budget for training). But nevertheless it would still be an instance of Problem 1 because they could understand the basic notion of “because of some actions involving complicated technology, the diamond is no longer in the room, even though it may look like it is.”
My understanding is that we are eschewing Problem 2, with one caveat—we still expect to solve the problem if the means by which the diamond was stolen or disappeared could be beyond a human’s ability to comprehend, as long as the outcome (that the diamond isn’t still in the room) is still comprehensible. For example, if the robber used some complicated novel technology to steal the diamond and hack the camera, there would be many things about the state that the human couldn’t understand even if the AI tried to explain it to them (at least without going over our compute budget for training). But nevertheless it would still be an instance of Problem 1 because they could understand the basic notion of “because of some actions involving complicated technology, the diamond is no longer in the room, even though it may look like it is.”