I can think of a bunch of random standard modes of display (top candidate: video and audio of what the simulated user sees and hears, plus subtitles of their internal model), and for the dispensaries you could run the simulation many times with random variations roughly along the same scope and dimensions as the differences between the simulations and reality, either just reacting plans that have to much divergence, or simply showing the display of all of them (wich’d also help against frivolous use if you have to watch the action 1000 times before doing it). I’d also say make the simulated user a total drone with seriously rewired neurology to try to always and only do what the AI tells it to.
Not that this solves the problem—I’ve countered the real dangerous things I notice instantly, but 5 mins to think of it and I’ll notice 20 more—but I though someone should actually try to answer the question in spirit and letter and most charitable interpretation.
I don’t see why the ‘oracle’ has to work from some real world goal in the first place. The oracle may have as it’s terminal goal the output of the relevant information on the screen with the level of clutter compatible with human visual cortex, and that’s it. Up to you to ask it to represent it in particular way.
Or not even that; the terminal goal of the mathematical system is to make some variables represent such output; an implementation of such system has those variables be computed and copied to the screen as pixels. The resulting system does not even self preserve; the abstract computation making abstract variables have certain abstract values is attained in the relevant sense even if the implementation is physically destroyed. (this is how software currently works)
Well, in the software you simply don’t implement the correspondence between mathematical abstractions that you rely on to build software (the equations, real valued numbers) and implementation (electrons in the memory, pixels on display, etc). There’s no point in that. If you do you encounter other issues, like wireheading.
I can think of a bunch of random standard modes of display (top candidate: video and audio of what the simulated user sees and hears, plus subtitles of their internal model), and for the dispensaries you could run the simulation many times with random variations roughly along the same scope and dimensions as the differences between the simulations and reality, either just reacting plans that have to much divergence, or simply showing the display of all of them (wich’d also help against frivolous use if you have to watch the action 1000 times before doing it). I’d also say make the simulated user a total drone with seriously rewired neurology to try to always and only do what the AI tells it to.
Not that this solves the problem—I’ve countered the real dangerous things I notice instantly, but 5 mins to think of it and I’ll notice 20 more—but I though someone should actually try to answer the question in spirit and letter and most charitable interpretation.
also, it’d make a nice movie.
I don’t see why the ‘oracle’ has to work from some real world goal in the first place. The oracle may have as it’s terminal goal the output of the relevant information on the screen with the level of clutter compatible with human visual cortex, and that’s it. Up to you to ask it to represent it in particular way.
Or not even that; the terminal goal of the mathematical system is to make some variables represent such output; an implementation of such system has those variables be computed and copied to the screen as pixels. The resulting system does not even self preserve; the abstract computation making abstract variables have certain abstract values is attained in the relevant sense even if the implementation is physically destroyed. (this is how software currently works)
The screen is a part of the real world.
Well, in the software you simply don’t implement the correspondence between mathematical abstractions that you rely on to build software (the equations, real valued numbers) and implementation (electrons in the memory, pixels on display, etc). There’s no point in that. If you do you encounter other issues, like wireheading.