Holden didn’t actually suggest that. And while this suggestion is in a certain sense ingenious—it’s not too far off from the sort of suggestions I flip through when considering how/if to implement CEV or similar processes—how do you “report the actions”? And do you report the reasons for them? And do you check to see if there are systematic discrepancies between consequences in the true model and consequences in the manipulated one? (This last point, btw, is sufficient that I would never try to literally implement this suggestion, but try to just structure preferences around some true model instead.)
I can think of a bunch of random standard modes of display (top candidate: video and audio of what the simulated user sees and hears, plus subtitles of their internal model), and for the dispensaries you could run the simulation many times with random variations roughly along the same scope and dimensions as the differences between the simulations and reality, either just reacting plans that have to much divergence, or simply showing the display of all of them (wich’d also help against frivolous use if you have to watch the action 1000 times before doing it). I’d also say make the simulated user a total drone with seriously rewired neurology to try to always and only do what the AI tells it to.
Not that this solves the problem—I’ve countered the real dangerous things I notice instantly, but 5 mins to think of it and I’ll notice 20 more—but I though someone should actually try to answer the question in spirit and letter and most charitable interpretation.
I don’t see why the ‘oracle’ has to work from some real world goal in the first place. The oracle may have as it’s terminal goal the output of the relevant information on the screen with the level of clutter compatible with human visual cortex, and that’s it. Up to you to ask it to represent it in particular way.
Or not even that; the terminal goal of the mathematical system is to make some variables represent such output; an implementation of such system has those variables be computed and copied to the screen as pixels. The resulting system does not even self preserve; the abstract computation making abstract variables have certain abstract values is attained in the relevant sense even if the implementation is physically destroyed. (this is how software currently works)
Well, in the software you simply don’t implement the correspondence between mathematical abstractions that you rely on to build software (the equations, real valued numbers) and implementation (electrons in the memory, pixels on display, etc). There’s no point in that. If you do you encounter other issues, like wireheading.
How do you report the path the car should take? On the map. How do you report better transistor design? In the blueprint. How do we report software design? With UML diagram. (how do you report why that transistor works? Show simulator). It’s just the most irreparable clinical psychopaths whom generate all outputs via extensive (and computationally expensive) modelling of the cognition (and decision process) of the listener. edit: i.e. modelling as to attain an outcome favourable to them; failing to empathise with listener that is failing to treat the listener as instance of self, but instead treating listener as a difficult to control servomechanism.
Isn’t the relevant quality of a “clinical psychopath,” here, something like “explicitly models cognition of the listener, instead of using empathy,” where “empathy”==something like “has an implicit model of the cognition of the listener”?
Implicit model that is rather incomplete and not wired for exploitation. That’s how psychopaths are successful at exploiting other people and talking people into stuff even though they have substandard model when it comes to actual communication, and their model actually sucks and is inferior to normal.
The human friendliness works via non modelling decision processes of other people when communicating; we do that when we deceive, lie, and bullshit, while when we are honest we sort of share the thoughts. This idea of oracle here is outright disturbing. It is clear nothing good comes out of full model of the listener; firstly it wastes the computing time and secondarily it generates bullshit, so you get something inferior at solving technical problems, and more dangerous, at the same time.
Meanwhile, much of the highly complex information that we would want to obtain from oracle is hopelessly impossible to convey in English anyway—hardware designs, cures, etc.
Holden didn’t actually suggest that. And while this suggestion is in a certain sense ingenious—it’s not too far off from the sort of suggestions I flip through when considering how/if to implement CEV or similar processes—how do you “report the actions”? And do you report the reasons for them? And do you check to see if there are systematic discrepancies between consequences in the true model and consequences in the manipulated one? (This last point, btw, is sufficient that I would never try to literally implement this suggestion, but try to just structure preferences around some true model instead.)
I can think of a bunch of random standard modes of display (top candidate: video and audio of what the simulated user sees and hears, plus subtitles of their internal model), and for the dispensaries you could run the simulation many times with random variations roughly along the same scope and dimensions as the differences between the simulations and reality, either just reacting plans that have to much divergence, or simply showing the display of all of them (wich’d also help against frivolous use if you have to watch the action 1000 times before doing it). I’d also say make the simulated user a total drone with seriously rewired neurology to try to always and only do what the AI tells it to.
Not that this solves the problem—I’ve countered the real dangerous things I notice instantly, but 5 mins to think of it and I’ll notice 20 more—but I though someone should actually try to answer the question in spirit and letter and most charitable interpretation.
also, it’d make a nice movie.
I don’t see why the ‘oracle’ has to work from some real world goal in the first place. The oracle may have as it’s terminal goal the output of the relevant information on the screen with the level of clutter compatible with human visual cortex, and that’s it. Up to you to ask it to represent it in particular way.
Or not even that; the terminal goal of the mathematical system is to make some variables represent such output; an implementation of such system has those variables be computed and copied to the screen as pixels. The resulting system does not even self preserve; the abstract computation making abstract variables have certain abstract values is attained in the relevant sense even if the implementation is physically destroyed. (this is how software currently works)
The screen is a part of the real world.
Well, in the software you simply don’t implement the correspondence between mathematical abstractions that you rely on to build software (the equations, real valued numbers) and implementation (electrons in the memory, pixels on display, etc). There’s no point in that. If you do you encounter other issues, like wireheading.
How do you report the path the car should take? On the map. How do you report better transistor design? In the blueprint. How do we report software design? With UML diagram. (how do you report why that transistor works? Show simulator). It’s just the most irreparable clinical psychopaths whom generate all outputs via extensive (and computationally expensive) modelling of the cognition (and decision process) of the listener. edit: i.e. modelling as to attain an outcome favourable to them; failing to empathise with listener that is failing to treat the listener as instance of self, but instead treating listener as a difficult to control servomechanism.
Isn’t the relevant quality of a “clinical psychopath,” here, something like “explicitly models cognition of the listener, instead of using empathy,” where “empathy”==something like “has an implicit model of the cognition of the listener”?
Implicit model that is rather incomplete and not wired for exploitation. That’s how psychopaths are successful at exploiting other people and talking people into stuff even though they have substandard model when it comes to actual communication, and their model actually sucks and is inferior to normal.
The human friendliness works via non modelling decision processes of other people when communicating; we do that when we deceive, lie, and bullshit, while when we are honest we sort of share the thoughts. This idea of oracle here is outright disturbing. It is clear nothing good comes out of full model of the listener; firstly it wastes the computing time and secondarily it generates bullshit, so you get something inferior at solving technical problems, and more dangerous, at the same time.
Meanwhile, much of the highly complex information that we would want to obtain from oracle is hopelessly impossible to convey in English anyway—hardware designs, cures, etc.