Yeah, this should work correctly, assuming that the AI’s prior specifies just one mathematical world, rather than e.g. a set of possible mathematical worlds weighted by simplicity. I posted about something similar five years ago.
The application to “fake cancer” is something that hadn’t occurred to me, and it seems like a really good idea at first glance.
Thanks, that’s useful. I’ll think how to formalise this correctly. Ideally I want a design where we’re still safe if a) the AI knows, correctly, that pressing a button will give it extra resources, but b) still doesn’t press it because its not part of its description.
Yeah, this should work correctly, assuming that the AI’s prior specifies just one mathematical world, rather than e.g. a set of possible mathematical worlds weighted by simplicity. I posted about something similar five years ago.
The application to “fake cancer” is something that hadn’t occurred to me, and it seems like a really good idea at first glance.
Thanks, that’s useful. I’ll think how to formalise this correctly. Ideally I want a design where we’re still safe if a) the AI knows, correctly, that pressing a button will give it extra resources, but b) still doesn’t press it because its not part of its description.