An AI can only ever follow it’s programming. (Same as a human actually.). If there is nothing in it’s programming to make it wonder if following its programming is a good idea, and nothing in its programming to define “good idea” (i.e. our greater goal desire to serve humankind or our country or some general set of our own desires, not to make paperclips) then it will simply use it’s incredible intelligence to find ways to follow its programming perfectly and horribly.
An AI can only ever follow it’s programming. (Same as a human actually.)
I don’t happen to agree with that, but in any case if in this respect there is no difference between an AI and a human, why, the problem in the OP just disappears :-)
The main difference is that we can intuitively predict to a close approximation what “following their programming” entails for a human being, but not for the AI.
Humans are social creatures, and as such come with the necessary wetware to be good at predicting each other. Humans do not have specialized wetware for predicting AIs. That wouldn’t be too much of a problem on its own, but humans have a tendency to use the wetware designed for predicting humans on things that aren’t humans. AIs, evolution, lightning, etc.
Telling a human foreman to make paperclips and programming an AI to do it are two very different things, but we still end up imagining them the same way.
In this case, it’s still not too big a problem. The main cause of confusion here isn’t that you’re comparing a human to an AI. It’s that you’re comparing telling with programming. The analog of programming an AI isn’t talking to a foreman. It’s brainwashing a foreman.
Of course, the foreman is still human, and would still end up changing his goals the way humans do. AIs aren’t built that way, or more precisely, since you can’t build an AI exactly the same as a human, building an AI that way has serious danger of having it evolve very inhuman goals.
Nope. An AI foreman has been programmed before I tell him to handle paperclip production.
From the text:
If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I’m away
If you program it first, then a lot depends on the subtleties. If you tell it to wait a minute and record everything you say, then interpret that and set it to its utility function, you’re effectively putting the finishing touches on programming. If you program it to assign utility to fulfilling commands you give it, you’ve already doomed the world before you even said anything. It will use all the resources at its disposal to make sure you say things that have already been done as rapidly as possible.
At the moment AIs are not built at all—in any way or in no way.
Hence the
or more precisely, since you can’t build an AI exactly the same as a human, building an AI that way has serious danger of having it evolve very inhuman goals.
The programming I’m talking about is not this (which is “telling”). The programming I’m talking about is the one which converts some hardware and a bunch of bits into a superintelligent AI.
…since you can’t build an AI exactly the same as a human, building an AI that way...
Huh? In any case, AIs self-develop and evolve. You might start with an AI that has an agreeable set of goals. There is no guarantee (I think, other people seem to disagree) that these goals will be the same after some time.
You might start with an AI that has an agreeable set of goals. There is no guarantee (I think, other people seem to disagree) that these goals will be the same after some time.
That’s what I mean. Since it’s not quite human, the goals won’t evolve quite the same way. I’ve seen speculation that doing nothing more than letting a human live for a few centuries would cause evolution to unagreeable goals.
I think, other people seem to disagree
A sufficiently smart AI that has sufficient understanding of its own utility function will take measures to make sure it doesn’t change. If it has an implicit utility function and trusts its future self to have a better understanding of it, or if it’s being stupid because it’s only just smart enough to self-modify, its goals may evolve.
We know it’s possible for an AI to have evolving goals because we have evolving goals.
So it’s a Goldilocks AI that has stable goals :-) A too-stupid AI might change its goals without really meaning it and a too-smart AI might change its goals because it wouldn’t be afraid of change (=trusts its future self).
It’s not that if it’s smart enough it trusts its future self. It’s that if it has vaguely-defined goals in a human-like manner, it might change its goals. An AI with explicit, fully understood, goals will not change its goals regardless of how intelligent it is.
You can generally predict the sort of solution space the foreman will explore in response to your request that he increase efficiency. In general, after a fairly small amount of exposure to other individuals, we can predict with reasonable accuracy how they would respond to many sorts of circumstances. We’re running software that’s practically designed for predicting the behavior of other humans.
we can predict with reasonable accuracy how they would respond to many sorts of circumstances.
Surely I can make the same claim about AIs. They wouldn’t be particularly useful otherwise.
In any case, this is all handwaving and speculation given that we don’t have any AIs to look at. Your claim a couple of levels above is unfalsifiable and so there isn’t much we can do at the moment to sort out that disagreement.
The claim ‘Pluto is currently inhabited by five hundred and thirty-eight witches’ is at this moment unfalsifiable. Does that mean that denying such a claim would be “all handwaving and speculation”? If science can’t make predictions about incompletely known phenomena, but can only describe past experiments and suggest (idle) future ones, then science is a remarkably useless thing. See for starters:
Sometimes a successful test of your hypothesis looks like the annihilation of life on Earth. So it is useful to be able to reason rigorously and productively about things we can’t (or shouldn’t) immediately test.
Surely I can make the same claim about AIs. They wouldn’t be particularly useful otherwise.
Well, a general AI with intelligence equal to or greater than that of a human without proven friendliness probably wouldn’t be very useful because it would be so unsafe. See Eliezer’s The Hidden Complexity of Wishes.
This is speculation, but far from blind speculation, considering we do have very strong evidence regarding our own adaptations to intuitively predict other humans, and an observably poor track record in intuitively predicting non-humalike optimization processes (example.)
...wouldn’t be very useful because it would be so unsafe.
First, the existence of such an AI would imply that at least somebody thought it was useful enough to build.
Second, the safety is not a function of intelligence but a function of capabilities. Eliezer’s genies are omnipotent and I don’t see why a (pre-singularity) AI would be.
I am also doubtful about that “observably poor track record”—which data are you relying on?
Yes? I don’t understand what you are arguing. The point of worrying about unFriendly AI is precisely that the unintended consequences can be utterly disastrous. Suggest you restate your thesis and what you think you are arguing against; at least one of us has lost track of the thread of the argument.
As the discussion in the thread evolved, my main thesis seems to be that it is possible for an AI to change its original goals (=terminal values). A few people are denying that this can happen.
I agree that AIs are unpredictable, however humans are as well. Statements about AIs being more unpredictable than humans are unfalsifiable as there is no empirical data and all we can do is handwave.
Ok. As I pointed out elsewhere, “AI” around here usually refers to the class of well-designed programs. A badly-programmed AI can obviously change its goals; if it does so, however, then by construction it is not good at achieving whatever the original goals were. Moreover,no matter what its starting goals are, it is really extremely unlikely to arrive at ones we would like by moving around in goal space, unless it is specifically designed, and well designed, to do so. “Human terminal values” is not an attractor in goal space. The paperclip maximiser is really much more likely than the human-happiness maximiser, on the obvious grounds that paperclips are much simpler than human happiness; but an iron-atoms maximiser is more likely still. The point is that you cannot rely on the supposed “obviousness” of morality to get your AI to self-modify into a desirable state; it’s only obvious to humans.
“AI” around here usually refers to the class of well-designed programs.
Define “well-designed”.
...you cannot rely on the supposed “obviousness” of morality to get your AI to self-modify into a desirable state
Huh? I never claimed (nor do I believe in anything like) obviousness of morality. Of course human terminal values are not an attractor in goal space. Absent other considerations there is no reason to think that an evolving AI would arrive at maximum-human-happiness values. Yes, unFriendly AI can be very dangerous. I never said otherwise.
First, the existence of such an AI would imply that at least somebody thought it was useful enough to build.
I’ve met people with very stupid ideas about how to control an AI, who were convinced that they knew how to build such an AI. I argued them out of those initial stupid ideas. Had I not, they would have tried to build the AI with their initial ideas, which they now admit were dangerous.
So people trying to build dangerous AIs without realising the danger is already a fact!
Don’t know why you keep on getting downvoted… Anyway, I agree with you, in that particular case (not naming names!).
But I’ve seen no evidence that competence in designing a powerful AI is related to competence in controlling a powerful AI. If anything, these seem much less related than you’d expect.
I suspect Lumifer’s getting downvoted for four reasons:
(1) A lot of his/her responses attack the weakest (or least clear) point in the original argument, even if it’s peripheral to the central argument, without acknowledging any updating on his/her part in response to the main argument. This results in the conversation spinning off in a lot of unrelated directions simultaneously. Steel-manning is a better strategy, because it also makes it clearer whether there’s a misunderstanding about what’s at issue.
(2) Lumifer is expressing consistently high confidence that appears disproportionate to his/her level of expertise and familiarity with the issues being discussed. In particular, s/he ’s unfamiliar with even the cursory summaries of Sequence points that could be found on the wiki. (This is more surprising, and less easy to justify, given how much karma s/he’s accumulated.)
(3) Lumifer’s tone comes off as cute and smirky and dismissive, even when the issues being debated are of enormous human importance and the claims being raised are at best not obviously correct, at worst obviously not correct.
(4) Lumifer is expressing unpopular views on LW without arguing for them. (In my experience, unpopular views receive polarizing numbers of votes on LW: They get disproportionately many up-votes if well-argued, disproportionately many down-votes if merely asserted. The most up-voted post in the history of LW is an extensive critique of MIRI.)
I didn’t downvote Lumifer’s “My prior that they were capable of building an actually dangerous AI cannot be distinguished from zero :-D”, but I think all four of those characteristics hold even for this relatively innocuous (and almost certainly correct) post. The response is glib and dismissive of the legitimate worry you raised, it reflects a lack of understanding of why this concern is serious (hence also lacks any relevant counter-argument; you already recognized that the people you were talking about weren’t going to succeed in building AI), and it changes the topic without demonstrating any updating in response to the previous argument.
First, the existence of such an AI would imply that at least somebody thought it was useful enough to build.
Which doesn’t mean that it would be a good idea. Have you read the Sequences? It seems like we’re missing some pretty important shared background here.
You can put her potential actions into “More Likely” & “Less Likely” boxes, but you can’t predict them with any certainty. What if the guy was the rapist she’s been plotting revenge against since she was 7 years old?
That would be in the “More Likely” bucket, or rather an “Extremely Likely” bucket. You said that the girl would say “hello” & that is in the “More Likely” bucket too, but far from a certainty. She could ignore him, turn the other way, poke him in the stomach, or do any of an almost infinite other things. Either way, you’re resorting to insults & I’ve barely engaged with you, so I’m going to ignore you from here on out.
you’re resorting to insults & I’ve barely engaged with you, so I’m going to ignore you from here on out.
If you had to guess, would you say you’re probably ignoring Rolf to protect your epistemically null feelings, or to protect your epistemology? (In terms of the actual cognitive mechanism causally responsible for your avoidance, not primarily in terms of your explicit linguistic reason.)
This statement is true but not relevant, because it doesn’t demonstrate a disanalogy between the woman and Deep Blue. In both cases we can only reason probabilistically with what we expect to have happen. This is true even if our knowledge of the software of Deep Blue or the neural state of the woman is so perfect that we can predict with near-certainty that it would take a physics-breaking miracle for anything other than X to occur. This doesn’t suffice for ‘certainty’ because we don’t have true certainty regarding physics or regarding the experiences that led to our understanding Deep Blue’s algorithms or the woman’s brain.
I would gather we have much more certainty about Deep Blue’s algorithms considering that we built them. You’re getting into hypothetical territory assuming that we can obtain near perfect knowledge of the human brain & that the neural state is all we need to predict future human behavior.
I would gather we have much more certainty about Deep Blue’s algorithms considering that we built them.
And you’d gather wrong. Our confidence that the woman says “hello” (and a fortiori our confidence that she does not take a gun and blow the man’s head off) exceeds our confidence that Deep Blue will make a particular chess move in response to most common plays by several couple orders of magnitude.
You’re getting into hypothetical territory assuming that we can obtain near perfect knowledge of the human brain & that the neural state is all we need to predict future human behavior.
We started off well into hypothetical territory, back when Stuart brought Clippy into his thought experiment. Within that territory, I’m trying to steer us away from the shoals of irrelevance by countering your hypothetical (‘but what if [insert unlikely scenario here]? see, humans can’t be predicted sometimes! therefore they are Unpredictable!’) with another hypothetical. But all of this still leaves us within sight of the shoals.
You’re missing the point, which is not that humans are perfectly predictable by other humans to arbitrarily high precision and in arbitrarily contrived scenarios, but that our evolved intuitions are vastly less reliable when predicting AI conduct from an armchair than when predicting human conduct from an armchair. That, and our explicit scientific knowledge of cognitive algorithms is too limited to get us very far with any complex agent. The best we could do is build a second Deep Blue to simulate the behavior of the first Deep Blue.
I’m not trying to argue that humans are completely unpredictable, but neither are AIs. If they were, there’d be no point in trying to design a friendly one.
About your point that humans are less able to predict AI behavior than human behavior, where are you getting those numbers from? I’m not saying that you’re wrong, I’m just skeptical that someone has studied the frequency of girls saying hello to strangers. Deep Blue has probably been studied pretty thoroughly; it’d be interesting to read about how unpredictable Deep Blue’s moves are.
Right. And I’m not trying to argue that we should despair of building a friendly AI, or of identifying friendliness. I’m just noting that the default is for AI behavior to be much harder than human behavior for humans to predict and understand. This is especially true for intelligences constructed through whole-brain emulation, evolutionary algorithms, or other relatively complex and autonomous processes.
It should be possible for us to mitigate the risk, but actually doing so may be one of the most difficult tasks humans have ever attempted, and is certainly one of the most consequential.
Let’s make this easy. Do you think the probability of a person saying “hello” to a stranger who just said “hello” to him/her is less than 10%? Do you think you can predict Deep Blue’s moves with greater than 10% confidence?
Deep Blue’s moves are, minimally, unpredictable enough to allow it to consistently outsmart the smartest and best-trained humans in the world in its domain. The comparison is almost unfair, because unpredictability is selected for in Deep Blue’s natural response to chess positions, whereas predictability is strongly selected for in human social conduct. If we can’t even come to an agreement on this incredibly simple base case—if we can’t even agree, for instance, that people greet each other with ‘hi!’ with higher frequency than Deep Blue executes a particular gambit—then talking about much harder cases will be unproductive.
I really don’t know the probability of a person saying hello to a stranger who said hello to them. It depends on too many factors, like the look & vibe of the stranger, the history of the person being said hello to, etc.
Given a time constraint, I’d agree that I’d be more likely to predict that the girl would reply hello than to predict Deep Blue’s next move, but if there were not a time constraint, I think Deep Blue’s moves would be almost 100% predictable. The reason being that all that Deep Blue does is calculate, it doesn’t consult its feelings before deciding what to do like a human might. It calculates 200 million positions per second to determine what the end result of any sequence of chess moves will be. If you gave a human enough time, I don’t see why they couldn’t perform the same calculation & come to the same conclusion that Deep Blue would.
Edit:
Reading more about Deep Blue, it sounds like it is not as straightforward as just calculating. There is some wiggle room in there based on the order in which its nodes talk to one another. It won’t always play the same move given the same board positioning. Really fascinating! Thanks for engaging politely, it motivated me to investigate this more & I’m glad I did.
I really don’t know the probability of a person saying hello to a stranger who said hello to them. It depends on too many factors
I’m not asking for the probability. I’m asking for your probability—the confidence you have that the event will occur. If you have very little confidence one way or the other, that doesn’t mean you assign no probability to it; it means you assign ~50% probability to it.
Everything in life depends on too many factors. If you couldn’t make predictions or decisions under uncertainty, then you wouldn’t even be able to cross the street. Fortunately, a lot of those factors cancel out or are extremely unlikely, which means that in many cases (including this one) we can make approximately reliable predictions using only a few pieces of information.
but if there were not a time constraint, I think Deep Blue’s moves would be almost 100% predictable.
Without a time constraint, the same may be true for the girl (especially if cryonics is feasible), since given enough time we’d be able to scan her brain and run thousands of simulations of what she’d do in this scenario. If you’re averse to unlikely hypotheticals, then you should be averse to removing realistic constraints.
An AI can only ever follow it’s programming. (Same as a human actually.). If there is nothing in it’s programming to make it wonder if following its programming is a good idea, and nothing in its programming to define “good idea” (i.e. our greater goal desire to serve humankind or our country or some general set of our own desires, not to make paperclips) then it will simply use it’s incredible intelligence to find ways to follow its programming perfectly and horribly.
I don’t happen to agree with that, but in any case if in this respect there is no difference between an AI and a human, why, the problem in the OP just disappears :-)
The problem is that unlike a human, the AI might succeed.
What would it mean for an AI to not follow it’s programming?
What have you done lately that contradicted your program?
The main difference is that we can intuitively predict to a close approximation what “following their programming” entails for a human being, but not for the AI.
Huh? That doesn’t look true to me at all. What is it, you say, that we can “intuitively predict”?
Humans are social creatures, and as such come with the necessary wetware to be good at predicting each other. Humans do not have specialized wetware for predicting AIs. That wouldn’t be too much of a problem on its own, but humans have a tendency to use the wetware designed for predicting humans on things that aren’t humans. AIs, evolution, lightning, etc.
Telling a human foreman to make paperclips and programming an AI to do it are two very different things, but we still end up imagining them the same way.
In this case, it’s still not too big a problem. The main cause of confusion here isn’t that you’re comparing a human to an AI. It’s that you’re comparing telling with programming. The analog of programming an AI isn’t talking to a foreman. It’s brainwashing a foreman.
Of course, the foreman is still human, and would still end up changing his goals the way humans do. AIs aren’t built that way, or more precisely, since you can’t build an AI exactly the same as a human, building an AI that way has serious danger of having it evolve very inhuman goals.
Nope. An AI foreman has been programmed before I tell him to handle paperclip production.
At the moment AIs are not built at all—in any way or in no way.
From the text:
If you program it first, then a lot depends on the subtleties. If you tell it to wait a minute and record everything you say, then interpret that and set it to its utility function, you’re effectively putting the finishing touches on programming. If you program it to assign utility to fulfilling commands you give it, you’ve already doomed the world before you even said anything. It will use all the resources at its disposal to make sure you say things that have already been done as rapidly as possible.
Hence the
The programming I’m talking about is not this (which is “telling”). The programming I’m talking about is the one which converts some hardware and a bunch of bits into a superintelligent AI.
Huh? In any case, AIs self-develop and evolve. You might start with an AI that has an agreeable set of goals. There is no guarantee (I think, other people seem to disagree) that these goals will be the same after some time.
That’s what I mean. Since it’s not quite human, the goals won’t evolve quite the same way. I’ve seen speculation that doing nothing more than letting a human live for a few centuries would cause evolution to unagreeable goals.
A sufficiently smart AI that has sufficient understanding of its own utility function will take measures to make sure it doesn’t change. If it has an implicit utility function and trusts its future self to have a better understanding of it, or if it’s being stupid because it’s only just smart enough to self-modify, its goals may evolve.
We know it’s possible for an AI to have evolving goals because we have evolving goals.
So it’s a Goldilocks AI that has stable goals :-) A too-stupid AI might change its goals without really meaning it and a too-smart AI might change its goals because it wouldn’t be afraid of change (=trusts its future self).
It’s not that if it’s smart enough it trusts its future self. It’s that if it has vaguely-defined goals in a human-like manner, it might change its goals. An AI with explicit, fully understood, goals will not change its goals regardless of how intelligent it is.
You can generally predict the sort of solution space the foreman will explore in response to your request that he increase efficiency. In general, after a fairly small amount of exposure to other individuals, we can predict with reasonable accuracy how they would respond to many sorts of circumstances. We’re running software that’s practically designed for predicting the behavior of other humans.
Surely I can make the same claim about AIs. They wouldn’t be particularly useful otherwise.
In any case, this is all handwaving and speculation given that we don’t have any AIs to look at. Your claim a couple of levels above is unfalsifiable and so there isn’t much we can do at the moment to sort out that disagreement.
The claim ‘Pluto is currently inhabited by five hundred and thirty-eight witches’ is at this moment unfalsifiable. Does that mean that denying such a claim would be “all handwaving and speculation”? If science can’t make predictions about incompletely known phenomena, but can only describe past experiments and suggest (idle) future ones, then science is a remarkably useless thing. See for starters:
Science Doesn’t Trust Your Rationality
Science Isn’t Strict Enough
Sometimes a successful test of your hypothesis looks like the annihilation of life on Earth. So it is useful to be able to reason rigorously and productively about things we can’t (or shouldn’t) immediately test.
Well, a general AI with intelligence equal to or greater than that of a human without proven friendliness probably wouldn’t be very useful because it would be so unsafe. See Eliezer’s The Hidden Complexity of Wishes.
This is speculation, but far from blind speculation, considering we do have very strong evidence regarding our own adaptations to intuitively predict other humans, and an observably poor track record in intuitively predicting non-humalike optimization processes (example.)
First, the existence of such an AI would imply that at least somebody thought it was useful enough to build.
Second, the safety is not a function of intelligence but a function of capabilities. Eliezer’s genies are omnipotent and I don’t see why a (pre-singularity) AI would be.
I am also doubtful about that “observably poor track record”—which data are you relying on?
This is also true of leaded gasoline, the reactor at Chernobyl, and thalidomide.
Notice that all your examples exist.
Oh, and the Law of Unintended Consequences is still fully operational.
Yes? I don’t understand what you are arguing. The point of worrying about unFriendly AI is precisely that the unintended consequences can be utterly disastrous. Suggest you restate your thesis and what you think you are arguing against; at least one of us has lost track of the thread of the argument.
As the discussion in the thread evolved, my main thesis seems to be that it is possible for an AI to change its original goals (=terminal values). A few people are denying that this can happen.
I agree that AIs are unpredictable, however humans are as well. Statements about AIs being more unpredictable than humans are unfalsifiable as there is no empirical data and all we can do is handwave.
Ok. As I pointed out elsewhere, “AI” around here usually refers to the class of well-designed programs. A badly-programmed AI can obviously change its goals; if it does so, however, then by construction it is not good at achieving whatever the original goals were. Moreover,no matter what its starting goals are, it is really extremely unlikely to arrive at ones we would like by moving around in goal space, unless it is specifically designed, and well designed, to do so. “Human terminal values” is not an attractor in goal space. The paperclip maximiser is really much more likely than the human-happiness maximiser, on the obvious grounds that paperclips are much simpler than human happiness; but an iron-atoms maximiser is more likely still. The point is that you cannot rely on the supposed “obviousness” of morality to get your AI to self-modify into a desirable state; it’s only obvious to humans.
Define “well-designed”.
Huh? I never claimed (nor do I believe in anything like) obviousness of morality. Of course human terminal values are not an attractor in goal space. Absent other considerations there is no reason to think that an evolving AI would arrive at maximum-human-happiness values. Yes, unFriendly AI can be very dangerous. I never said otherwise.
I’ve met people with very stupid ideas about how to control an AI, who were convinced that they knew how to build such an AI. I argued them out of those initial stupid ideas. Had I not, they would have tried to build the AI with their initial ideas, which they now admit were dangerous.
So people trying to build dangerous AIs without realising the danger is already a fact!
My prior that they were capable of building an actually dangerous AI cannot be distinguished from zero :-D
Don’t know why you keep on getting downvoted… Anyway, I agree with you, in that particular case (not naming names!).
But I’ve seen no evidence that competence in designing a powerful AI is related to competence in controlling a powerful AI. If anything, these seem much less related than you’d expect.
I suspect Lumifer’s getting downvoted for four reasons:
(1) A lot of his/her responses attack the weakest (or least clear) point in the original argument, even if it’s peripheral to the central argument, without acknowledging any updating on his/her part in response to the main argument. This results in the conversation spinning off in a lot of unrelated directions simultaneously. Steel-manning is a better strategy, because it also makes it clearer whether there’s a misunderstanding about what’s at issue.
(2) Lumifer is expressing consistently high confidence that appears disproportionate to his/her level of expertise and familiarity with the issues being discussed. In particular, s/he ’s unfamiliar with even the cursory summaries of Sequence points that could be found on the wiki. (This is more surprising, and less easy to justify, given how much karma s/he’s accumulated.)
(3) Lumifer’s tone comes off as cute and smirky and dismissive, even when the issues being debated are of enormous human importance and the claims being raised are at best not obviously correct, at worst obviously not correct.
(4) Lumifer is expressing unpopular views on LW without arguing for them. (In my experience, unpopular views receive polarizing numbers of votes on LW: They get disproportionately many up-votes if well-argued, disproportionately many down-votes if merely asserted. The most up-voted post in the history of LW is an extensive critique of MIRI.)
I didn’t downvote Lumifer’s “My prior that they were capable of building an actually dangerous AI cannot be distinguished from zero :-D”, but I think all four of those characteristics hold even for this relatively innocuous (and almost certainly correct) post. The response is glib and dismissive of the legitimate worry you raised, it reflects a lack of understanding of why this concern is serious (hence also lacks any relevant counter-argument; you already recognized that the people you were talking about weren’t going to succeed in building AI), and it changes the topic without demonstrating any updating in response to the previous argument.
Heh. People are people, even on LW...
Which doesn’t mean that it would be a good idea. Have you read the Sequences? It seems like we’re missing some pretty important shared background here.
Ok. Take a chess position. Deep Blue is playing black. What is its next move?
A girl is walking down the street. A guy comes up to her, says hello. What’s her next move?
She says “hello” and moves right on. She does not pull out a gun and blow his head off. Now, back to Deep Blue.
You can put her potential actions into “More Likely” & “Less Likely” boxes, but you can’t predict them with any certainty. What if the guy was the rapist she’s been plotting revenge against since she was 7 years old?
What if the chess position is mate in one move? Cases that are sufficiently special to ride the short bus do not make a general argument.
That would be in the “More Likely” bucket, or rather an “Extremely Likely” bucket. You said that the girl would say “hello” & that is in the “More Likely” bucket too, but far from a certainty. She could ignore him, turn the other way, poke him in the stomach, or do any of an almost infinite other things. Either way, you’re resorting to insults & I’ve barely engaged with you, so I’m going to ignore you from here on out.
If you had to guess, would you say you’re probably ignoring Rolf to protect your epistemically null feelings, or to protect your epistemology? (In terms of the actual cognitive mechanism causally responsible for your avoidance, not primarily in terms of your explicit linguistic reason.)
I’m trying to protect Rolf because he can’t seem to interact with others without lashing out at them abusively.
This statement is true but not relevant, because it doesn’t demonstrate a disanalogy between the woman and Deep Blue. In both cases we can only reason probabilistically with what we expect to have happen. This is true even if our knowledge of the software of Deep Blue or the neural state of the woman is so perfect that we can predict with near-certainty that it would take a physics-breaking miracle for anything other than X to occur. This doesn’t suffice for ‘certainty’ because we don’t have true certainty regarding physics or regarding the experiences that led to our understanding Deep Blue’s algorithms or the woman’s brain.
I would gather we have much more certainty about Deep Blue’s algorithms considering that we built them. You’re getting into hypothetical territory assuming that we can obtain near perfect knowledge of the human brain & that the neural state is all we need to predict future human behavior.
And you’d gather wrong. Our confidence that the woman says “hello” (and a fortiori our confidence that she does not take a gun and blow the man’s head off) exceeds our confidence that Deep Blue will make a particular chess move in response to most common plays by several couple orders of magnitude.
We started off well into hypothetical territory, back when Stuart brought Clippy into his thought experiment. Within that territory, I’m trying to steer us away from the shoals of irrelevance by countering your hypothetical (‘but what if [insert unlikely scenario here]? see, humans can’t be predicted sometimes! therefore they are Unpredictable!’) with another hypothetical. But all of this still leaves us within sight of the shoals.
You’re missing the point, which is not that humans are perfectly predictable by other humans to arbitrarily high precision and in arbitrarily contrived scenarios, but that our evolved intuitions are vastly less reliable when predicting AI conduct from an armchair than when predicting human conduct from an armchair. That, and our explicit scientific knowledge of cognitive algorithms is too limited to get us very far with any complex agent. The best we could do is build a second Deep Blue to simulate the behavior of the first Deep Blue.
I’m not trying to argue that humans are completely unpredictable, but neither are AIs. If they were, there’d be no point in trying to design a friendly one.
About your point that humans are less able to predict AI behavior than human behavior, where are you getting those numbers from? I’m not saying that you’re wrong, I’m just skeptical that someone has studied the frequency of girls saying hello to strangers. Deep Blue has probably been studied pretty thoroughly; it’d be interesting to read about how unpredictable Deep Blue’s moves are.
Right. And I’m not trying to argue that we should despair of building a friendly AI, or of identifying friendliness. I’m just noting that the default is for AI behavior to be much harder than human behavior for humans to predict and understand. This is especially true for intelligences constructed through whole-brain emulation, evolutionary algorithms, or other relatively complex and autonomous processes.
It should be possible for us to mitigate the risk, but actually doing so may be one of the most difficult tasks humans have ever attempted, and is certainly one of the most consequential.
Let’s make this easy. Do you think the probability of a person saying “hello” to a stranger who just said “hello” to him/her is less than 10%? Do you think you can predict Deep Blue’s moves with greater than 10% confidence?
Deep Blue’s moves are, minimally, unpredictable enough to allow it to consistently outsmart the smartest and best-trained humans in the world in its domain. The comparison is almost unfair, because unpredictability is selected for in Deep Blue’s natural response to chess positions, whereas predictability is strongly selected for in human social conduct. If we can’t even come to an agreement on this incredibly simple base case—if we can’t even agree, for instance, that people greet each other with ‘hi!’ with higher frequency than Deep Blue executes a particular gambit—then talking about much harder cases will be unproductive.
I really don’t know the probability of a person saying hello to a stranger who said hello to them. It depends on too many factors, like the look & vibe of the stranger, the history of the person being said hello to, etc.
Given a time constraint, I’d agree that I’d be more likely to predict that the girl would reply hello than to predict Deep Blue’s next move, but if there were not a time constraint, I think Deep Blue’s moves would be almost 100% predictable. The reason being that all that Deep Blue does is calculate, it doesn’t consult its feelings before deciding what to do like a human might. It calculates 200 million positions per second to determine what the end result of any sequence of chess moves will be. If you gave a human enough time, I don’t see why they couldn’t perform the same calculation & come to the same conclusion that Deep Blue would.
Edit:
Reading more about Deep Blue, it sounds like it is not as straightforward as just calculating. There is some wiggle room in there based on the order in which its nodes talk to one another. It won’t always play the same move given the same board positioning. Really fascinating! Thanks for engaging politely, it motivated me to investigate this more & I’m glad I did.
I’m not asking for the probability. I’m asking for your probability—the confidence you have that the event will occur. If you have very little confidence one way or the other, that doesn’t mean you assign no probability to it; it means you assign ~50% probability to it.
Everything in life depends on too many factors. If you couldn’t make predictions or decisions under uncertainty, then you wouldn’t even be able to cross the street. Fortunately, a lot of those factors cancel out or are extremely unlikely, which means that in many cases (including this one) we can make approximately reliable predictions using only a few pieces of information.
Without a time constraint, the same may be true for the girl (especially if cryonics is feasible), since given enough time we’d be able to scan her brain and run thousands of simulations of what she’d do in this scenario. If you’re averse to unlikely hypotheticals, then you should be averse to removing realistic constraints.