I think that the separation between “AIs that care about the physical world” and “AIs that care only about the Platonic world” is not that clean in practice. The way I would expect an AGI optimizing a toy world to actually work is, run simulations of the toy world and look for simplified models of it that allow for feasible optimization. However, in this way it can stumble across a model that contains our physical world together with the toy world. This model is false in the Platonic world, but testing it using a simulation (i.e. trying to exploit some leak in the box) will actually show that it’s true (because the simulation is in fact running in the physical world rather than the Platonic world). Specifically, it seems to me that such a toy world is safe if and only if its description complexity is lower than the description complexity of physical world + toy world.
The agent could be programmed to have a certain hard-coded ontology rather than searching through all possible hypotheses weighted by description length.
My point is, I don’t think it’s possible to implement a strong computationally feasible agent which doesn’t search through possible hypotheses, because solving the optimization problem for the hard-coded ontology is intractable. In other words, what gives intelligence its power is precisely the search through possible hypotheses.
I think that the separation between “AIs that care about the physical world” and “AIs that care only about the Platonic world” is not that clean in practice. The way I would expect an AGI optimizing a toy world to actually work is, run simulations of the toy world and look for simplified models of it that allow for feasible optimization. However, in this way it can stumble across a model that contains our physical world together with the toy world. This model is false in the Platonic world, but testing it using a simulation (i.e. trying to exploit some leak in the box) will actually show that it’s true (because the simulation is in fact running in the physical world rather than the Platonic world). Specifically, it seems to me that such a toy world is safe if and only if its description complexity is lower than the description complexity of physical world + toy world.
The agent could be programmed to have a certain hard-coded ontology rather than searching through all possible hypotheses weighted by description length.
My point is, I don’t think it’s possible to implement a strong computationally feasible agent which doesn’t search through possible hypotheses, because solving the optimization problem for the hard-coded ontology is intractable. In other words, what gives intelligence its power is precisely the search through possible hypotheses.