The key thing to notice is that in order to exploit this scenario, we have to have a world-model that is precise enough to model reality much better than humans, but not be so good at modelling a reality that it’s world models are isomorphic to a reality.
This might be easy or challenging, but it does mean we probably can’t crank up the world-modeling part indefinitely while still trapping it via wireheading.
The key thing to notice is that in order to exploit this scenario, we have to have a world-model that is precise enough to model reality much better than humans, but not be so good at modelling a reality that it’s world models are isomorphic to a reality.
This might be easy or challenging, but it does mean we probably can’t crank up the world-modeling part indefinitely while still trapping it via wireheading.