I understand your point, and agree that your conclusion is one that many smart, rational people with good general knowledge would share. Once again I concur that engaging with those X’s is important, including that ‘X’ we’re discussing here.
Sounds like we mostly agree. However, I don’t think it’s a question of general knowledge. I’m talking about smart, rational people who have studied AI enough to have strongly-held opinions about it. Those are the people who need to be convinced; their opinions propagate to smart, rational people who haven’t personally investigated AI in depth.
I’d love to hear your take on X here. What are your reasons for believing that friendliness can be formalized practically, and an AGI based on that formalization built before any other sort of AGI?
If I was SIAI my reasoning would be the following. First stop with the believes- believes not dichotomy and move to probabilities.
So what is the probability of a good outcome if you can’t formalize friendliness before AGI? Some of them would argue infinitesimal. This is based on fast take-off winner take all type scenarios (I have a problem with this stage, but I would like it to be properly argued and that is hard).
So looking at the decision tree (under these assumptions) the only chance of a good outcome is to try to formalise FAI before AGI becomes well known. All the other options lead to extinction.
So to attack the “formalise Friendliness before AGI” position you would need to argue that the first AGIs are very unlikely to kill us all. That is the major battleground as far as I am concerned.
Agreed about what the “battleground” is, modulo one important nit: not the first AGI, but the first AGI that recursively self-improves at a high speed. (I’m pretty sure that’s what you meant, but it’s important to keep in mind that, e.g., a roughly human-level AGI as such is not what we need to worry about—the point is not that intelligent computers are magically superpowerful, but that it seems dangerously likely that quickly self-improving intelligences, if they arrive, will be non-magically superpowerful.)
I don’t think formalize-don’t formalize should be a simple dichotomy either; friendliness can be formalized in various levels of detail, and the more details are formalized, the fewer unconstrained details there are which could be wrong in a way that kills us all.
I’d look at it the other way: I’d take it as practically certain that any superintelligence built without explicit regard to Friendliness will be unFriendly, and ask what the probability is that through sufficiently slow growth in intelligence and other mere safeguards, we manage to survive building it.
My best hope currently rests on the AGI problem being hard enough that we get uploads first.
(This is essentially the Open Thread about everything Eliezer or SIAI have ever said now, right?)
Uploading would have quite a few benefits, but I get the impression it would make us more vulnerable to whatever tools a hostile AI may possess, not less.
“So what is the probability of a good outcome if you can’t formalize friendliness before AGI? Some of them would argue infinitesimal.”
One problem here is the use of a circular definition of “friendliness”—that defines the concept it in terms of whether it leads to a favourable outcome. If you think “friendly” is defined in terms of whether or not the machine destroys humanity, then clearly you will think that an “unfriendly” machine would destroy the world. However, this is just a word game—which doesn’t tell us anything about the actual chances of such destruction happening.
Let’s say “we” are the good guys in the race for AI. Define
W = we win the race to create an AI powerful enough to protect humanity from any subsequent AIs
G = our AI can be used to achieve a good outcome
F = we go the “formalize friendliness” route
O = we go a promising route other than formalizing friendliness
At issue is which of the following is higher:
P(G|WF)P(W|F) or P(G|WO)P(W|O)
From what I know of SIAI’s approach to F, I estimate P(W|F) to be many orders of magnitude smaller than P(W|O). I estimate P(G|WO) to be more than 1% for a good choice of O (this is a lower bound; my actual estimate of P(G|WO) is much higher, but you needn’t agree with that to agree with my conclusion). Therefore the right side wins.
There are two points here that one could conceivably dispute, but it sounds like the “SIAI logic” is to dispute my estimate of P(G|WO) and say that P(G|WO) is in fact tiny. I haven’t seen SIAI give a convincing argument for that.
My summary would be: there are huge numbers of types of minds and motivations, so if we pick one at random from the space of minds then it likely to be contrary to our values because it will have a different sense of what is good or worthwhile. This moderately relies on the speed/singleton issue, because evolution pressure between AI might force them in the same direction as us. We would likely be out-competed before this happens though, if we rely on competition between AIs.
I think various people associated with SIAI mean different things by formalizing friendliness. I remember Vladimir Nesov means getting better than 50% probability for providing a good outcome.
It doesn’t matter what happens when we sample a mind at random. We only care about the sorts of minds we might build, whether by designing them or evolving them. Either way, they’ll be far from random.
Consider my “at random” short hand for “at random from the space of possible minds built by humans”.
The Eliezer approved example of humans not getting a simple system to do what they want is the classic Machine Learning example where a Neural Net was trained on two different sorts of tanks. It had happened that the photographs of the different types of tanks had been taken at different times of day. So the classifier just worked on that rather than actually looking at the types of tank. So we didn’t build a tank classifier but a day/night classifier. More here.
While I may not agree with Eliezer on everything, I do agree with him it is damn hard to get a computer to do what you want when you stop programming them explicitly .
Obviously AI is hard, and obviously software has bugs.
To counter my argument, you need to make a case that the bugs will be so fundamental and severe, and go undetected for so long, that despite any safeguards we take, they will lead to catastrophic results with probability greater than 99%.
Things like AI boxing or “emergency stop buttons” would be instances of safeguards. Basically any form of human supervision that can keep the AI in check even if it’s not safe to let it roam free.
Are you really suggesting a trial and error approach where we stick evolved and human created AIs in boxes and then eyeball them to see what they are like? Then pick the nicest looking one, on a hunch, to have control over our light cone?
This is why we need to create friendliness before AGI → A lot of people who are loosely familiar with the subject think those options will work!
A goal directed intelligence will work around any obstacles in front of it. It’ll make damn sure that it prevents anyone from pressing emergency stop buttons.
The first AI will be determined by the first programmer, sure. But I wasn’t talking about that level; the biases and concern for the ethics of the AI of that programmer will be random from the space of humans. Or at least I can’t see any reason why I should expect people who care about ethics to be more likely to make AI than those that think economics will constrain AI to be nice,
That is now a completely different argument to the original “there are huge numbers of types of minds and motivations, so if we pick one at random from the space of minds”.
Re: “the biases and concern for the ethics of the AI of that programmer will be random from the space of humans”
Those concerned probably have to be an expert programmers, able to build a company or research group, and attract talented assistance, as well as probably customers. They will probably be far-from what you would get if you chose at “random”.
Do we pick a side of a coin “at random” from the two possibilities when we flip it?
Epistemically, yes, we don’t have sufficient information to predict it*. However if we do the same thing twice it has the same outcome so it is not physically random.
So while the process that decides what the first AI is like is not physically random, it is epistemically random until we have a good idea of what AIs produce good outcomes and get humans to follow those theories. For this we need something that looks like a theory of friendliness, to some degree.
Considering we might use evolutionary methods for part of the AI creation process, randomness doesn’t look like too bad a model.
*With a few caveats. I think it is biased to land the same way up as it was when flipped, due to the chance of making it spin and not flip.
We do have an extensive body of knowledge about how to write computer programs that do useful things. The word “random” seems like a terrible mis-summary of that body of information to me.
As for “evolution” being equated to “randomnness”—isn’t that one of the points that creationists make all the time? Evolution has two motors—variation and selection. The first of these may have some random elements, but it is only one part of the overall process.
I think we have a disconnect on how much we believe proper scary AIs will be like previous computer programs.
My conception of current computer programs is that they are crystallised thoughts plucked from our own minds and easily controllable and unchangeable. When we get interesting AI the programs will morphing and be far less controllable without a good theory of how to control the change.
I shudder every time people say the “AI’s source code” as if it is some unchangeable and informative thing about the AI’s behaviour after the first few days of the AI’s existence.
You have correctly identified the area in which we do not agree.
The most relevant knowledge needed in this case is knowledge of game theory and human behaviour. They also need to know ‘friendliness is a very hard problem’. They then need to ask themselves the following question:
What is likely to happen if people have the ability to create an AGI but do not have a proven mechanism for implementing friendliness? Is it:
Shelve the AGI, don’t share the research and set to work on creating a framework for friendliness. Don’t rush the research—act as if the groundbreaking AGI work that you just created was a mere toy problem and the only real challenge is the friendliness. Spend an even longer period of time verifying the friendliness design and never let on that you have AGI capabilities.
Something else.
What are your reasons for believing that friendliness can be formalized practically, and an AGI based on that formalization built before any other sort of AGI?
I don’t (with that phrasing). I actually suspect that the problem is too difficult to get right and far too easy to get wrong. We’re probably all going to die. However, I think we’re even more likely to die if some fool goes and invents a AGI before they have a proven theory of friendliness.
Those are the people, indeed. But where do the donations come from? EY seems to be using this argument against me as well. I’m just not educated, well-read or intelligent enough for any criticism. Maybe so, I acknowledged that in my post. But have I seen any pointers to how people arrive at their estimations yet? No, just the demand to read all of LW, which according to EY doesn’t even deal with what I’m trying to figure out, but rather the dissolving of biases. A contradiction?
I’m inquiring about the strong claims made by the SIAI, which includes EY and LW. Why? Because they ask for my money and resources. Because they gather fanatic followers who believe into the possibility of literally going to hell. If you follow the discussion surrounding Roko’s posts you’ll see what I mean. And because I’m simply curious and like to discuss, besides becoming less wrong.
But if EY or someone else is going to tell me that I’m just too dumb and it doesn’t matter what I do, think or donate, I can accept that. I don’t expect Richard Dawkins to enlighten me about evolution either. But don’t expect me to stay quiet about my insignificant personal opinion and epistemic state (as you like to call it) either! Although since I’m conveniently not neurotypical (I guess), you won’t have to worry me turning into an antagonist either, simply because EY is being impolite.
I understand your point, and agree that your conclusion is one that many smart, rational people with good general knowledge would share. Once again I concur that engaging with those X’s is important, including that ‘X’ we’re discussing here.
Sounds like we mostly agree. However, I don’t think it’s a question of general knowledge. I’m talking about smart, rational people who have studied AI enough to have strongly-held opinions about it. Those are the people who need to be convinced; their opinions propagate to smart, rational people who haven’t personally investigated AI in depth.
I’d love to hear your take on X here. What are your reasons for believing that friendliness can be formalized practically, and an AGI based on that formalization built before any other sort of AGI?
If I was SIAI my reasoning would be the following. First stop with the believes- believes not dichotomy and move to probabilities.
So what is the probability of a good outcome if you can’t formalize friendliness before AGI? Some of them would argue infinitesimal. This is based on fast take-off winner take all type scenarios (I have a problem with this stage, but I would like it to be properly argued and that is hard).
So looking at the decision tree (under these assumptions) the only chance of a good outcome is to try to formalise FAI before AGI becomes well known. All the other options lead to extinction.
So to attack the “formalise Friendliness before AGI” position you would need to argue that the first AGIs are very unlikely to kill us all. That is the major battleground as far as I am concerned.
Agreed about what the “battleground” is, modulo one important nit: not the first AGI, but the first AGI that recursively self-improves at a high speed. (I’m pretty sure that’s what you meant, but it’s important to keep in mind that, e.g., a roughly human-level AGI as such is not what we need to worry about—the point is not that intelligent computers are magically superpowerful, but that it seems dangerously likely that quickly self-improving intelligences, if they arrive, will be non-magically superpowerful.)
I don’t think formalize-don’t formalize should be a simple dichotomy either; friendliness can be formalized in various levels of detail, and the more details are formalized, the fewer unconstrained details there are which could be wrong in a way that kills us all.
I’d look at it the other way: I’d take it as practically certain that any superintelligence built without explicit regard to Friendliness will be unFriendly, and ask what the probability is that through sufficiently slow growth in intelligence and other mere safeguards, we manage to survive building it.
My best hope currently rests on the AGI problem being hard enough that we get uploads first.
(This is essentially the Open Thread about everything Eliezer or SIAI have ever said now, right?)
Uploading would have quite a few benefits, but I get the impression it would make us more vulnerable to whatever tools a hostile AI may possess, not less.
Re: “My best hope currently rests on the AGI problem being hard enough that we get uploads first.”
Surely a miniscule chance. It would be like Boeing booting up a scanned bird.
“So what is the probability of a good outcome if you can’t formalize friendliness before AGI? Some of them would argue infinitesimal.”
One problem here is the use of a circular definition of “friendliness”—that defines the concept it in terms of whether it leads to a favourable outcome. If you think “friendly” is defined in terms of whether or not the machine destroys humanity, then clearly you will think that an “unfriendly” machine would destroy the world. However, this is just a word game—which doesn’t tell us anything about the actual chances of such destruction happening.
Let’s say “we” are the good guys in the race for AI. Define
W = we win the race to create an AI powerful enough to protect humanity from any subsequent AIs
G = our AI can be used to achieve a good outcome
F = we go the “formalize friendliness” route
O = we go a promising route other than formalizing friendliness
At issue is which of the following is higher:
P(G|WF)P(W|F) or P(G|WO)P(W|O)
From what I know of SIAI’s approach to F, I estimate P(W|F) to be many orders of magnitude smaller than P(W|O). I estimate P(G|WO) to be more than 1% for a good choice of O (this is a lower bound; my actual estimate of P(G|WO) is much higher, but you needn’t agree with that to agree with my conclusion). Therefore the right side wins.
There are two points here that one could conceivably dispute, but it sounds like the “SIAI logic” is to dispute my estimate of P(G|WO) and say that P(G|WO) is in fact tiny. I haven’t seen SIAI give a convincing argument for that.
I’d start here to get an overview.
My summary would be: there are huge numbers of types of minds and motivations, so if we pick one at random from the space of minds then it likely to be contrary to our values because it will have a different sense of what is good or worthwhile. This moderately relies on the speed/singleton issue, because evolution pressure between AI might force them in the same direction as us. We would likely be out-competed before this happens though, if we rely on competition between AIs.
I think various people associated with SIAI mean different things by formalizing friendliness. I remember Vladimir Nesov means getting better than 50% probability for providing a good outcome.
Edited to add my own overview.
It doesn’t matter what happens when we sample a mind at random. We only care about the sorts of minds we might build, whether by designing them or evolving them. Either way, they’ll be far from random.
Consider my “at random” short hand for “at random from the space of possible minds built by humans”.
The Eliezer approved example of humans not getting a simple system to do what they want is the classic Machine Learning example where a Neural Net was trained on two different sorts of tanks. It had happened that the photographs of the different types of tanks had been taken at different times of day. So the classifier just worked on that rather than actually looking at the types of tank. So we didn’t build a tank classifier but a day/night classifier. More here.
While I may not agree with Eliezer on everything, I do agree with him it is damn hard to get a computer to do what you want when you stop programming them explicitly .
Obviously AI is hard, and obviously software has bugs.
To counter my argument, you need to make a case that the bugs will be so fundamental and severe, and go undetected for so long, that despite any safeguards we take, they will lead to catastrophic results with probability greater than 99%.
How do you consider “formalizing friendliness” to be different from “building safeguards”?
Things like AI boxing or “emergency stop buttons” would be instances of safeguards. Basically any form of human supervision that can keep the AI in check even if it’s not safe to let it roam free.
Are you really suggesting a trial and error approach where we stick evolved and human created AIs in boxes and then eyeball them to see what they are like? Then pick the nicest looking one, on a hunch, to have control over our light cone?
I’ve never seen the appeal of AI boxing.
This is why we need to create friendliness before AGI → A lot of people who are loosely familiar with the subject think those options will work!
A goal directed intelligence will work around any obstacles in front of it. It’ll make damn sure that it prevents anyone from pressing emergency stop buttons.
Better than chance? What chance?
Sorry, “Better than chance” is an english phrase than tends to mean more than 50%.
It assumes an even chance of each outcome. I.e. do better than selecting randomly.
Not appropriate in this context, my brain didn’t think of the wider implications as it wrote it.
It’s easy to do better than random. *Pours himself a cup of tea.*
Programmers do not operate by “picking programs at random”, though.
The idea that “picking programs at random” has anything to do with the issue seems just confused to me.
The first AI will be determined by the first programmer, sure. But I wasn’t talking about that level; the biases and concern for the ethics of the AI of that programmer will be random from the space of humans. Or at least I can’t see any reason why I should expect people who care about ethics to be more likely to make AI than those that think economics will constrain AI to be nice,
That is now a completely different argument to the original “there are huge numbers of types of minds and motivations, so if we pick one at random from the space of minds”.
Re: “the biases and concern for the ethics of the AI of that programmer will be random from the space of humans”
Those concerned probably have to be an expert programmers, able to build a company or research group, and attract talented assistance, as well as probably customers. They will probably be far-from what you would get if you chose at “random”.
Do we pick a side of a coin “at random” from the two possibilities when we flip it?
Epistemically, yes, we don’t have sufficient information to predict it*. However if we do the same thing twice it has the same outcome so it is not physically random.
So while the process that decides what the first AI is like is not physically random, it is epistemically random until we have a good idea of what AIs produce good outcomes and get humans to follow those theories. For this we need something that looks like a theory of friendliness, to some degree.
Considering we might use evolutionary methods for part of the AI creation process, randomness doesn’t look like too bad a model.
*With a few caveats. I think it is biased to land the same way up as it was when flipped, due to the chance of making it spin and not flip.
Edit: Oh and no open source AI then?
We do have an extensive body of knowledge about how to write computer programs that do useful things. The word “random” seems like a terrible mis-summary of that body of information to me.
As for “evolution” being equated to “randomnness”—isn’t that one of the points that creationists make all the time? Evolution has two motors—variation and selection. The first of these may have some random elements, but it is only one part of the overall process.
I think we have a disconnect on how much we believe proper scary AIs will be like previous computer programs.
My conception of current computer programs is that they are crystallised thoughts plucked from our own minds and easily controllable and unchangeable. When we get interesting AI the programs will morphing and be far less controllable without a good theory of how to control the change.
I shudder every time people say the “AI’s source code” as if it is some unchangeable and informative thing about the AI’s behaviour after the first few days of the AI’s existence.
I’m not sure how to resolve that difference.
You have correctly identified the area in which we do not agree.
The most relevant knowledge needed in this case is knowledge of game theory and human behaviour. They also need to know ‘friendliness is a very hard problem’. They then need to ask themselves the following question:
What is likely to happen if people have the ability to create an AGI but do not have a proven mechanism for implementing friendliness? Is it:
Shelve the AGI, don’t share the research and set to work on creating a framework for friendliness. Don’t rush the research—act as if the groundbreaking AGI work that you just created was a mere toy problem and the only real challenge is the friendliness. Spend an even longer period of time verifying the friendliness design and never let on that you have AGI capabilities.
Something else.
I don’t (with that phrasing). I actually suspect that the problem is too difficult to get right and far too easy to get wrong. We’re probably all going to die. However, I think we’re even more likely to die if some fool goes and invents a AGI before they have a proven theory of friendliness.
Those are the people, indeed. But where do the donations come from? EY seems to be using this argument against me as well. I’m just not educated, well-read or intelligent enough for any criticism. Maybe so, I acknowledged that in my post. But have I seen any pointers to how people arrive at their estimations yet? No, just the demand to read all of LW, which according to EY doesn’t even deal with what I’m trying to figure out, but rather the dissolving of biases. A contradiction?
I’m inquiring about the strong claims made by the SIAI, which includes EY and LW. Why? Because they ask for my money and resources. Because they gather fanatic followers who believe into the possibility of literally going to hell. If you follow the discussion surrounding Roko’s posts you’ll see what I mean. And because I’m simply curious and like to discuss, besides becoming less wrong.
But if EY or someone else is going to tell me that I’m just too dumb and it doesn’t matter what I do, think or donate, I can accept that. I don’t expect Richard Dawkins to enlighten me about evolution either. But don’t expect me to stay quiet about my insignificant personal opinion and epistemic state (as you like to call it) either! Although since I’m conveniently not neurotypical (I guess), you won’t have to worry me turning into an antagonist either, simply because EY is being impolite.