If there is no good reason for an AI to be friendly (a belief which is plausible, but that I’ve never seen proven, and which is implied by the assumption that unfriendly AI is vastly more likely), then what’s left but hand-coded goals?
It’s still an idea I’m working on, but it’s plausible that any AI which is trying to accomplish something complicated in the material world will pursue knowledge of math, physics, and engineering. Even an AI which doesn’t have an explicitly physical goal (maybe it’s a chess program) still might get into physics and engineering in order to improve its own functioning.
What I’m wondering is whether Friendliness might shake out from more general goals.
It’s interesting that Friendliness to the ecosphere has been shaking out of other goals for a good many people in recent decades.
It’s still an idea I’m working on, but it’s plausible that any AI which is trying to accomplish something complicated in the material world will pursue knowledge of math, physics, and engineering. Even an AI which doesn’t have an explicitly physical goal (maybe it’s a chess program) still might get into physics and engineering in order to improve its own functioning.
(One exception: It seems conceivable that game theory, plus the possibility of being in a simulation, might give rise to a general rule like “treat your inferiors as you would be treated by your superiors” that would restrain arbitrary AIs.)
Whether boredom is a universally pro-survival trait for any entity which is capable of feeling it (I’ve heard that turtles will starve if they aren’t given enough variety in their food) is a topic worth investigating. I bet that having some outward focus rather than just wanting internal states reliably increases the chances of survival.
On the other hand, “treat your inferiors as you would be treated by your superiors” is assuredly not a reliable method of doing well in a simulation, just considering the range of human art and the popularity of humor based on humiliation.
Are you more entertaining if you torture Sims or if you build the largest possible sustainable Sim city? It depends on the audience.
(One exception: It seems conceivable that game theory, plus the possibility of being in a simulation, might give rise to a general rule like “treat your inferiors as you would be treated by your superiors” that would restrain arbitrary AIs.)
This seems like postulating minds (magic) as basic ontological entities. Where’s the line between “inferiors” and other patterns of atoms?
“Generally intelligent optimization processes” is a natural category, don’t you think? (Though there might be no non-magical-thinking reason for it to be game-theoretically relevant in this way. Or the most game-theoretically natural (Schelling point) border might exclude humans.)
“Generally intelligent optimization processes” is a natural category, don’t you think?
Categories are kludges used to get around inability to make analysis more precise. The “laws” expressed in terms of natural categories are only binding as long as you remain unable to see the world at a deeper level. The question is not whether “minds” constitute a natural category, this is forlorn with smarter AIs, but whether “minds” deductively implies “things to treat as you would be treated by your superiors” (whatever that means).
The difference in rules comes from ability of more powerful AI to look at a situation and see it in detail, taking moves in favor of AI’s goals via the most unexpected exceptions to the most reliable heuristic rules. You can’t rely on natural categories when they are fought with magical intent. You can fight magic only with magic, and in this case this means postulating the particular magic in the fabric of the world that helps to justify your argument. Your complex wish can’t be granted by natural laws.
If there is no good reason for an AI to be friendly (a belief which is plausible, but that I’ve never seen proven, and which is implied by the assumption that unfriendly AI is vastly more likely), then what’s left but hand-coded goals?
Unfriendly AI is only vastly more plausible if you’re not doing it right. Out of the space of all possible preferences, human friendly preferences are a tiny sliver. If you picked at random you would surely get something as bad as a paperclipper.
As optimizers, we can try to aim at the space of human friendly preferences, but we’re stupid optimizers in this domain and compared to the complexity of this problem. A program could better target this space, and we are much much more likely to be smart enough to write that program, than to survive the success of an AI based on hand coded goals and killswitches.
This is like going to the moon: Let the computer steer.
There is no hand coded goal in my proposal. I propose to craft the prior, i.e. restrict the worlds the AI can consider possible.
This is the reason both why the procedure is comparatively simple (in comparison with friendly AI) and why the resulting AIs are less powerful.
Hand coded goals are what you’re trying to patch over.
Don’t think about it this way. This is not a path to a solution.
If there is no good reason for an AI to be friendly (a belief which is plausible, but that I’ve never seen proven, and which is implied by the assumption that unfriendly AI is vastly more likely), then what’s left but hand-coded goals?
What would a “good reason” constitute? (Have you read the metaethics sequence?)
I expect the intended contrast is to extrapolated volition, which is ‘hand-coded’ on the meta level.
It’s still an idea I’m working on, but it’s plausible that any AI which is trying to accomplish something complicated in the material world will pursue knowledge of math, physics, and engineering. Even an AI which doesn’t have an explicitly physical goal (maybe it’s a chess program) still might get into physics and engineering in order to improve its own functioning.
What I’m wondering is whether Friendliness might shake out from more general goals.
It’s interesting that Friendliness to the ecosphere has been shaking out of other goals for a good many people in recent decades.
This does seem very likely. See Steve Omohundro’s “The Basic AI Drives” for one discussion.
This seems very unlikely; see Value is Fragile.
(One exception: It seems conceivable that game theory, plus the possibility of being in a simulation, might give rise to a general rule like “treat your inferiors as you would be treated by your superiors” that would restrain arbitrary AIs.)
Whether boredom is a universally pro-survival trait for any entity which is capable of feeling it (I’ve heard that turtles will starve if they aren’t given enough variety in their food) is a topic worth investigating. I bet that having some outward focus rather than just wanting internal states reliably increases the chances of survival.
On the other hand, “treat your inferiors as you would be treated by your superiors” is assuredly not a reliable method of doing well in a simulation, just considering the range of human art and the popularity of humor based on humiliation.
Are you more entertaining if you torture Sims or if you build the largest possible sustainable Sim city? It depends on the audience.
This seems like postulating minds (magic) as basic ontological entities. Where’s the line between “inferiors” and other patterns of atoms?
“Generally intelligent optimization processes” is a natural category, don’t you think? (Though there might be no non-magical-thinking reason for it to be game-theoretically relevant in this way. Or the most game-theoretically natural (Schelling point) border might exclude humans.)
Categories are kludges used to get around inability to make analysis more precise. The “laws” expressed in terms of natural categories are only binding as long as you remain unable to see the world at a deeper level. The question is not whether “minds” constitute a natural category, this is forlorn with smarter AIs, but whether “minds” deductively implies “things to treat as you would be treated by your superiors” (whatever that means).
The difference in rules comes from ability of more powerful AI to look at a situation and see it in detail, taking moves in favor of AI’s goals via the most unexpected exceptions to the most reliable heuristic rules. You can’t rely on natural categories when they are fought with magical intent. You can fight magic only with magic, and in this case this means postulating the particular magic in the fabric of the world that helps to justify your argument. Your complex wish can’t be granted by natural laws.
Unfriendly AI is only vastly more plausible if you’re not doing it right. Out of the space of all possible preferences, human friendly preferences are a tiny sliver. If you picked at random you would surely get something as bad as a paperclipper.
As optimizers, we can try to aim at the space of human friendly preferences, but we’re stupid optimizers in this domain and compared to the complexity of this problem. A program could better target this space, and we are much much more likely to be smart enough to write that program, than to survive the success of an AI based on hand coded goals and killswitches.
This is like going to the moon: Let the computer steer.