I don’t think the lack of an earth-shattering ka-FOOM changes much of the logic of FAI. Smart enough to take over the world is enough to make human existence way better, or end it entirely.
It’s quite tricky to ensure that your superintelligent AI does anything like what you wanted it to. I don’t share the intuition that creating a “homeostasis” AI is any easier than an FAI. I think one move Eliezer is making in his “Creating Friendly AI” strategy is to minimize the goals you’re trying to give the machine; just CEV.
I think this makes apparent what a good CEV seeker needs anyway; some sense of restraint when CEV can’t be reliably extrapolated in one giant step. It’s less than certain that even a full FOOM AI could reliably extrapolate to some final most-preferred world state.
I’d like to see a program where humanity actually chooses its own future; we skip the extrapolation and just use CV repeatedly; let people live out their own extrapolation.
Does just CV work all right? I don’t know, but it might. Sure, Palestinians want to kill Israelis and vice versa; but they both want to NOT be killed way more than they want to kill, and most other folks don’t want to see either of them killed.
Or perhaps we need a much more cautious, “OK, let’s vote on improvements, but they can’t kill anybody and benefits have to be available to everyone...” policy for the central guide of AI.
CEV is a well thought out proposal (perhaps the only one—counterexamples?), but we need more ideas in the realm of AI motivation/ethics systems. Particularly, ways to get from a practical AI with goals like “design neat products for GiantCo” or “obey orders from my commanding officer” to ensure that they don’t ruin everything if they start to self-improve. Not everyone is going to want to give their AI CEV as its central goal, at least not until it’s clear it can/will self improve, at which point it’s probably too late.
I don’t share the intuition that creating a “homeostasis” AI is any easier than an FAI.
While CEV is an admirably limited goal compared to the goal of immediately bringing about paradise, it still allows the AI to potentially intervene in billions of people’s lives. Even if the CEV is muddled enough that the AI wouldn’t actually change much for the typical person, the AI is still being asked to ‘check’ to see what it’s supposed to do to everyone. The AI has to have some instructions that give it the power to redistribute most of the Earth’s natural resources, because it’s possible that the CEV would clearly and immediately call for some major reforms. With that power comes the chance that the power could be used unwisely, which requires tremendously intricate, well-tested, and redundant safeguards.
By contrast, a homeostasis or shield AI would never contemplate affecting billions of people; it would only be ‘checking’ to see whether a few thousand AI researchers are getting too close. It would only need enough resources to, say, shut off the electricity to a lab now and then, or launch an EMP or thermite weapon. It would be given invariant instructions not to seize control of most of Earth’s natural resources. That means, at least for some levels of risk-tolerance, that it doesn’t need quite as many safeguards, and so it should be easier and faster to design.
Actually, Shield AI can’t be much less intrusive then FAI to require sufficiently simpler safeguards.
It will need worldwide spying network to gather required intel.
It will need global enforcement network to establish its presence in uncooperative nations / terminate AI research.
Given restrictive rules for dealing with humans, its planning/problem solving algorithms will tend to circumvent them, as these rules are direct obstacle to its main goal.
When it will find that humans are weak links in any automated system, we can expect that it will try to fully eliminate its dependence on humans.
So, I’ve read The Hidden Complexity of Wishes, and I think the dangers can be avoided. I don’t want to design a shield AI that minimizes the probability of unfriendly AIs launching—I want to design a shield AI that reduces the probability until either (a) the probability is some low number, like 0.1% per year, or (b) the shield AI has gained control of its quota of one of the thousands of specified resources. Then the shield AI stops.
A “worldwide spying network” could consist of three really good satellites and an decent hacking routine.
A “global enforcement network” need not be in constant effect; its components could be acquired and dismissed as and when needed.
If the AI “circumvents us,” then that’s great. That means we won’t much notice its actions. Likewise if it “eliminates its dependence on us.” Although if you mean that the AI might escape its box, then I would argue that certain instructions can be left unmodifiable; this AI would not have full ability to modify its source code. Hence the idea of a mini-FOOM rather than a full FOOM; with human vetos over certain modifications, it wouldn’t full-FOOM even if such things were possible.
You’ll find it surprisingly difficult to express what does “AI stops” mean, in terms of AI’s preference. AI always exerts some influence on the world, just by existing. By “AI stops”, you mean a particular kind of influence, but it’s very difficult to formalize what kind of influence, exactly, constitutes “stopping”.
I imagine that an AI would periodically evaluate the state of the world, generate values for many variables that describe that state, identify a set of actions that its programming recommends for those values of the variables, and then take those actions.
For an AI to stop doing something would mean that the state of the world corresponding to an AI having reached maximum acceptable resource usage generates variables that lead to a set of actions that do not include additional use by the AI of those resources.
For example, if the AI is controlling an amount of water that we would think is “too much water,” then the AI would not take actions that involve moving significant amounts of water or significantly affecting its quality. The AI would know that it is “controlling” the water because it would model the world for a few cycles after it took no action, and model the world after it took some action, and notice that the water was in different places in the two models. It would do this a few seconds out, a few minutes out, a few hours out, a few days out, a few months out, and a few years out, doing correspondingly blunter and cruder models each time the period increased to economize on processing power and screen out movements of water that were essentially due to chaos theory rather than proximate causation.
Am I missing something? I realize you probably have a lot more experience specifying AI behavior than I do, but it’s difficult for me to understand the logic behind your insight that specifying “AI stops” is hard. Please explain it to me when you get a chance.
Vladimir, I understand that you’re a respected researcher, but if you keep voting down my comments without explaining why you disagree with me, I’m going to stop talking to you. It doesn’t matter how right you are if you never teach me anything.
If you would like me to stop talking to you, feel free to say so outright, and I will do so, without any hard feelings.
(I didn’t downvote the grandparent, and didn’t even see it downvoted. Your comment is still on my “to reply” list, so when it doesn’t feel like work I might eventually reply (the basic argument is related to “hidden complexity of wishes” and preference is thorough). Also note that I don’t receive a notification when you reply to your own comment, I only saw this because I’m subscribed to wedrifid’s comments.)
We can subscribe to comments? Is this via RSS? And can we do it in bulk? (I suppose that would mean creating many subscriptions and grouping them in the reader.)
What I would like is to be able to subscribe to a feed of “comments that have been upvoted by people who usually upvote the comments that I like and downvote the comments that I dislike”.
Via RSS, I’m subscribed to comments by about 20 people (grouped in a Google Reader folder), which is usually enough to point me to whatever interesting discussions are going on, but doesn’t require looking through tons of relatively low-signal comments in the global comments stream. It’s a good solution, you won’t be able to arrange a similar quality of comment selection automatically.
I don’t think the lack of an earth-shattering ka-FOOM changes much of the logic of FAI. Smart enough to take over the world is enough to make human existence way better, or end it entirely.
It’s quite tricky to ensure that your superintelligent AI does anything like what you wanted it to. I don’t share the intuition that creating a “homeostasis” AI is any easier than an FAI. I think one move Eliezer is making in his “Creating Friendly AI” strategy is to minimize the goals you’re trying to give the machine; just CEV.
I think this makes apparent what a good CEV seeker needs anyway; some sense of restraint when CEV can’t be reliably extrapolated in one giant step. It’s less than certain that even a full FOOM AI could reliably extrapolate to some final most-preferred world state.
I’d like to see a program where humanity actually chooses its own future; we skip the extrapolation and just use CV repeatedly; let people live out their own extrapolation.
Does just CV work all right? I don’t know, but it might. Sure, Palestinians want to kill Israelis and vice versa; but they both want to NOT be killed way more than they want to kill, and most other folks don’t want to see either of them killed.
Or perhaps we need a much more cautious, “OK, let’s vote on improvements, but they can’t kill anybody and benefits have to be available to everyone...” policy for the central guide of AI.
CEV is a well thought out proposal (perhaps the only one—counterexamples?), but we need more ideas in the realm of AI motivation/ethics systems. Particularly, ways to get from a practical AI with goals like “design neat products for GiantCo” or “obey orders from my commanding officer” to ensure that they don’t ruin everything if they start to self-improve. Not everyone is going to want to give their AI CEV as its central goal, at least not until it’s clear it can/will self improve, at which point it’s probably too late.
While CEV is an admirably limited goal compared to the goal of immediately bringing about paradise, it still allows the AI to potentially intervene in billions of people’s lives. Even if the CEV is muddled enough that the AI wouldn’t actually change much for the typical person, the AI is still being asked to ‘check’ to see what it’s supposed to do to everyone. The AI has to have some instructions that give it the power to redistribute most of the Earth’s natural resources, because it’s possible that the CEV would clearly and immediately call for some major reforms. With that power comes the chance that the power could be used unwisely, which requires tremendously intricate, well-tested, and redundant safeguards.
By contrast, a homeostasis or shield AI would never contemplate affecting billions of people; it would only be ‘checking’ to see whether a few thousand AI researchers are getting too close. It would only need enough resources to, say, shut off the electricity to a lab now and then, or launch an EMP or thermite weapon. It would be given invariant instructions not to seize control of most of Earth’s natural resources. That means, at least for some levels of risk-tolerance, that it doesn’t need quite as many safeguards, and so it should be easier and faster to design.
Actually, Shield AI can’t be much less intrusive then FAI to require sufficiently simpler safeguards.
It will need worldwide spying network to gather required intel.
It will need global enforcement network to establish its presence in uncooperative nations / terminate AI research.
Given restrictive rules for dealing with humans, its planning/problem solving algorithms will tend to circumvent them, as these rules are direct obstacle to its main goal.
When it will find that humans are weak links in any automated system, we can expect that it will try to fully eliminate its dependence on humans.
Etc.
So, I’ve read The Hidden Complexity of Wishes, and I think the dangers can be avoided. I don’t want to design a shield AI that minimizes the probability of unfriendly AIs launching—I want to design a shield AI that reduces the probability until either (a) the probability is some low number, like 0.1% per year, or (b) the shield AI has gained control of its quota of one of the thousands of specified resources. Then the shield AI stops.
A “worldwide spying network” could consist of three really good satellites and an decent hacking routine.
A “global enforcement network” need not be in constant effect; its components could be acquired and dismissed as and when needed.
If the AI “circumvents us,” then that’s great. That means we won’t much notice its actions. Likewise if it “eliminates its dependence on us.” Although if you mean that the AI might escape its box, then I would argue that certain instructions can be left unmodifiable; this AI would not have full ability to modify its source code. Hence the idea of a mini-FOOM rather than a full FOOM; with human vetos over certain modifications, it wouldn’t full-FOOM even if such things were possible.
You’ll find it surprisingly difficult to express what does “AI stops” mean, in terms of AI’s preference. AI always exerts some influence on the world, just by existing. By “AI stops”, you mean a particular kind of influence, but it’s very difficult to formalize what kind of influence, exactly, constitutes “stopping”.
I imagine that an AI would periodically evaluate the state of the world, generate values for many variables that describe that state, identify a set of actions that its programming recommends for those values of the variables, and then take those actions.
For an AI to stop doing something would mean that the state of the world corresponding to an AI having reached maximum acceptable resource usage generates variables that lead to a set of actions that do not include additional use by the AI of those resources.
For example, if the AI is controlling an amount of water that we would think is “too much water,” then the AI would not take actions that involve moving significant amounts of water or significantly affecting its quality. The AI would know that it is “controlling” the water because it would model the world for a few cycles after it took no action, and model the world after it took some action, and notice that the water was in different places in the two models. It would do this a few seconds out, a few minutes out, a few hours out, a few days out, a few months out, and a few years out, doing correspondingly blunter and cruder models each time the period increased to economize on processing power and screen out movements of water that were essentially due to chaos theory rather than proximate causation.
Am I missing something? I realize you probably have a lot more experience specifying AI behavior than I do, but it’s difficult for me to understand the logic behind your insight that specifying “AI stops” is hard. Please explain it to me when you get a chance.
Vladimir, I understand that you’re a respected researcher, but if you keep voting down my comments without explaining why you disagree with me, I’m going to stop talking to you. It doesn’t matter how right you are if you never teach me anything.
If you would like me to stop talking to you, feel free to say so outright, and I will do so, without any hard feelings.
(I didn’t downvote the grandparent, and didn’t even see it downvoted. Your comment is still on my “to reply” list, so when it doesn’t feel like work I might eventually reply (the basic argument is related to “hidden complexity of wishes” and preference is thorough). Also note that I don’t receive a notification when you reply to your own comment, I only saw this because I’m subscribed to wedrifid’s comments.)
All right, I apologize for jumping to conclusions.
We can subscribe to comments? Is this via RSS? And can we do it in bulk? (I suppose that would mean creating many subscriptions and grouping them in the reader.)
What I would like is to be able to subscribe to a feed of “comments that have been upvoted by people who usually upvote the comments that I like and downvote the comments that I dislike”.
Via RSS, I’m subscribed to comments by about 20 people (grouped in a Google Reader folder), which is usually enough to point me to whatever interesting discussions are going on, but doesn’t require looking through tons of relatively low-signal comments in the global comments stream. It’s a good solution, you won’t be able to arrange a similar quality of comment selection automatically.
(I upvoted the grandparent. It seems to ask relevant questions and give clear reasons for asking them.)