The deeper problem is that you can’t really program “make me happy” in the same way that you can’t program “make this image look like I want”.
On one hand, Friendly AI people want to convert “make me happy” to a formal specification. Doing that has many potential pitfalls. because it is a formal specification.
On the other hand, Richard, I think, wants to simply tell the AI, in English, “Make me happy.” Given that approach, he makes the reasonable point that any AI smart enough to be dangerous would also be smart enough to interpret that at least as intelligently as a human would.
I think the important question here is, Which approach is better? LW always assumes the first, formal approach.
To be more specific (and Bayesian): Which approach gives a higher expected value? Formal specification is compatible with Eliezer’s ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
“Tell the AI in English” can fail, but the worst case is closer to a “With Folded Hands” scenario than to paperclips.
I’ve never considered the “Tell the AI what to do in English” approach before, but on first inspection it seems safer to me.
C. direct normativity—program the AI to value what we value.
B. indirect normativity—program the AI to value figuring out what our values are and then valuing those things.
A. indirect indirect normativity—program the AI to value doing whatever we tell it to, and then tell it, in English, “Value figuring out what our values are and then valuing those things.”
I can see why you might consider A superior to C. I’m having a harder time seeing how A could be superior to B. I’m not sure why you say “Doing that has many potential pitfalls. because it is a formal specification.” (Suppose we could make an artificial superintelligence that thinks ‘informally’. What specifically would this improve, safety-wise?)
Regardless, the AI thinks in math. If you tell it to interpret your phonemes, rather than coding your meaning into its brain yourself, that doesn’t mean you’ll get an informal representation. You’ll just get a formal one that’s reconstructed by the AI itself.
It’s not clear to me that programming a seed to understand our commands (and then commanding it to become Friendlier) is easier than just programming it to become Friendlier, but in any case the processes are the same after the first stage. That is, A is the same as B but with a little extra added to the beginning, and it’s not clear to me why that little extra language-use stage is supposed to add any safety. Why wouldn’t it just add one more stage at which something can go wrong?
Regardless, the AI thinks in math. If you tell it to interpret your phonemes, rather than coding your meaning into its brain yourself, that doesn’t mean you’ll get an informal representation. You’ll just get a formal one that’s reconstructed by the AI itself.
It is misleading to say that an interpreted language is formal because the C compiler is formal. Existence proof: Human language. I presume you think the hardware that runs the human mind has a formal specification. That hardware runs the interpreter of human language. You could argue that English therefore is formal, and indeed it is, in exactly the sense that biology is formal because of physics: technically true, but misleading.
This will boil down to a semantic argument about what “formal” means. Now, I don’t think that human minds—or computer programs—are “formal”. A formal process is not Turing complete. Formalization means modeling a process so that you can predict or place bounds on its results without actually simulating it. That’s what we mean by formal in practice. Formal systems are systems in which you can construct proofs. Turing-complete systems are ones where some things cannot be proven. If somebody talks about “formal methods” of programming, they don’t mean programming with a language that has a formal definition. They mean programming in a way that lets you provably verify certain things about the program without running the program. The halting problem implies that for a programming language to allow you to verify even that the program will terminate, your language may no longer be Turing-complete.
Eliezer’s approach to FAI is inherently formal in this sense, because he wants to be able to prove that an AI will or will not do certain things. That means he can’t avail himself of the full computational complexity of whatever language he’s programming in.
But I’m digressing from the more-important distinction, which is one of degree and of connotation. The words “formal system” always go along with computational systems that are extremely brittle, and that usually collapse completely with the introduction of a single mistake, such as a resolution theorem prover that can prove any falsehood if given one false belief. You may be able to argue your way around the semantics of “formal” to say this is not necessarily the case, but as a general principle, when designing a representational or computational system, fault-tolerance and robustness to noise are at odds with the simplicity of design and small number of interactions that make proving things easy and useful.
That all makes sense, but I’m missing the link between the above understanding of ‘formal’ and these four claims, if they’re what you were trying to say before:
(1) Indirect indirect normativity is less formal, in the relevant sense, than indirect normativity. I.e., because we’re incorporating more of human natural language into the AI’s decision-making, the reasoning system will be more tolerant of local errors, uncertainty, and noise.
(2) Programming an AI to value humans’ True Preferences in general (indirect normativity) has many pitfalls that programming an AI to value humans’ instructions’ True Meanings in general (indirect indirect normativity) doesn’t, because the former is more formal.
(3) “‘Tell the AI in English’ can fail, but the worst case is closer to a ‘With Folded Hands’ scenario than to paperclips.”
(4) The “With Folded Hands”-style scenario I have in mind is not as terrible as the paperclips scenario.
Wouldn’t this only be correct if similar hardware ran the software the same way? Human thinking is highly associative and variable, and as language is shared amongst many humans, it means that it doesn’t, as such, have a fixed formal representation.
You are a rational and reasonable person. Why not speak up about what is happening here? Rob is making a spirited defense of his essay, over on his blog, and I have just posted a detailed critique that really nails down the core of the argument that is supposed to be happening here.
And yet, if you look closely you will find that all of my comments—be they as neutral, as sensible or as rational as they can be—are receiving negative votes so fast that they are disappearing to the bottom of the stack or being suppressed completely.
What a bizarre situation!! This article that RobbBB submitted to LessWrong is supposed to be ABOUT my own article on the IEET website. My article is the actual TOPIC here! And yet I, the author of that article, have been insulted here by Eliezer Yudkowsky, and my comments suppressed. Amazing, don’t you think?
Richard: On LessWrong, comments are sorted by how many thumbs up and thumbs down they get, because it makes it easier to find the most popular posts quickly. If a post gets −4 points or lower, it gets compressed to make room for more popular posts, and to discourage flame wars. (You can still un-compress it by just clicking the + in the upper right corner of the comment.) At the moment, some of Eliezer’s comments and yours have both been down-voted and compressed in this way, presumably because people on the site thought the personal attacks weren’t useful for the conversation as a whole.
People are probably also down-voting your comments because they’re histrionic and don’t reflect an understanding of this forum’s mechanics. I recommend only making points about the substance of people’s arguments; if you have personal complaints, take it to a private channel so it doesn’t add to the noise surrounding the arguments themselves.
Relatedly, Phil: You above described yourself and Richard Loosemore as “the two people (Eliezer) should listen to most”. Loosemore and I are having a discussion here. Does the content of that discussion affect your view of Richard’s level of insight into the problem of Friendly Artificial Intelligence?
Which approach gives a higher expected value? Formal specification is compatible with Eliezer’s ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
“Tell the AI in English” can fail, but the worst case is closer to a “With Folded Hands” scenario than to paperclips.
I don’t think that’s how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.
Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don’t believe his position says that a near-Friendly-Complete specification is very risky—after all, a “smart” AGI would know what you really meant—but that such a specification would be superfluous.
Whether Mr. Loosemore is correct isn’t cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels.
I don’t think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn’t been either.
Both are addressing intentionally friendly or neutral AI that goes wrong.
I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
I think it’s a question of what you program in, and what you let it figure out for itself. If you want to prove formally that it will behave in certain ways, you would like to program in explicitly, formally, what its goals mean. But I think that “human pleasure” is such a complicated idea that trying to program it in formally is asking for disaster. That’s one of the things that you should definitely let the AI figure out for itself. Richard is saying that an AI as smart as a smart person would never conclude that human pleasure equals brain dopamine levels.
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
I think that “human pleasure” is such a complicated idea that trying to program it in formally is asking for disaster. That’s one of the things that you should definitely let the AI figure out for itself.
[...]
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
“Tell the AI in English” is in essence an utility function “Maximize the value of X, where X is my current opinion of what some english text Y means”.
The ‘understanding English’ module, the mapping function between X and “what you told in English” is completely arbitrary, but is very important to the AI—so any self-modifying AI will want to modify and improve that. Also, we don’t have a good “understanding English” module so yes, we also want the AI to be able to modify and improve that. But, it can be wildly different from reality or opinions of humans—there are trivial ways of how well-meaning dialogue systems can misunderstand statements.
However, for the AI “improve the module” means “change the module so that my utility grows”—so in your example it has strong motivation to intentionally misunderstand English. The best case scenario is to misunderstand “Make everyone happy” as “Set your utility function to MAXINT”. The worst case scenario is, well, everything else.
There’s the classic quote “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”—if the AI doesn’t care in the first place, then “Tell AI what to do in English” won’t make it care.
By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT. You don’t need to speak to it in English for that—if you asked the AI to maximize paperclips, that is the equivalent of “Maximize the value of X, where X is my current opinion of how many paperclips there are”, and it would modify its paperclip-counting module to always return MAXINT.
You are correct that telling the AI to do Y is equivalent to “maximize the value of X, where X is my current opinion about Y”. However, “current” really means “current”, not “new”. If the AI is actually trying to obey the command to do Y, it won’t change its utility function unless having a new utility function will increase its utility according to its current utility function. Neither misunderstanding nor understanding will raise its utility unless its current utility function values having a utility function that misunderstands or understands.
By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT.
That’s allegedly more or less what happened to Eurisko (here, section 2), although it didn’t trick itself quite that cleanly. The problem was only solved by algorithmically walling off its utility function from self-modification: an option that wouldn’t work for sufficiently strong AI, and one to avoid if you want to eventually allow your AI the capacity for a more precise notion of utility than you can give it.
Paperclipping as the term’s used here assumes value stability.
A human is a counterexample. A human emulation would count as an AI, so human behavior is one possible AI behavior. Richard’s argument is that humans don’t respond to orders or requests in anything like these brittle, GOFAI-type systems invoked by the word “formal systems”. You’re not considering that possibility. You’re still thinking in terms of formal systems.
(Unpacking the significant differences between how humans operate, and the default assumptions that the LW community makes about AI, would take… well, five years, maybe ten.)
A human emulation would count as an AI, so human behavior is one possible AI behavior.
Uhh, no. Look, humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants. Not because we’re some kind of “informal system”. Any computer program is a formal system, but there are simply more and less complex ones. All you are suggesting is building a very complex (“informal”) system and hoping that because it’s complex (like humans!) it will behave in a humanish way.
Your response avoids the basic logic here. A human emulation would count as an AI, therefore human behavior isone possible AI behavior. There is nothing controversial in the statement; the conclusion is drawn from the premise. If you don’t think a human emulation would count as AI, or isn’t possible, or something else, fine, but… why wouldn’t a human emulation count as an AI? How, for example, can we even think about advanced intelligence, much less attempt to model it, without considering human intelligence?
...humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants.
I don’t think this is generally an accurate (or complex) description of human behavior, but it does sound to me like an “informal system”—i.e. we tend to care. My reading of (at least this part of) PhilGoetz’s position is that it makes more sense to imagine something we would call an advanced or super AI responding to requests and commands with a certain nuance of understanding (as humans do) than with the inflexible (“brittle”) formality of, say, your average BASIC program.
The thing is, humans do that by… well, not being formal systems. Which pretty much requires you to keep a good fraction of the foibles and flaws of a nonformal, nonrigorously rational system.
You’d be more likely to get FAI, but FAI itself would be devalued, since now it’s possible for the FAI itself to make rationality errors.
Phil, Unfortunately you are commenting without (seemingly) checking the original article of mine that RobbBB is discussing here. So, you say “On the other hand, Richard, I think, wants to simply tell the AI, in English, “Make me happy.” ”. In fact, I am not at all saying that. :-)
My article was discussing someone else’s claims about AI, and dissecting their claims. So I was not making any assertions of my own about the motivation system.
Aside: You will also note that I was having a productive conversation with RobbBB about his piece, when Yudkowsky decided to intervene with some gratuitous personal slander directed at me (see above). That discussion is now at an end.
I’m afraid reading all that and giving a full response to either you or RobbBB isn’t possible in the time I have available this weekend.
I agree that Eliezer is acting like a spoiled child, but calling people on their irrational interpersonal behavior within less wrong doesn’t work. Calling them on mistakes they make about mathematics is fine, but calling them on how they treat others on less wrong will attract more reflexive down-votes from people who think you’re contaminating their forum with emotion, than upvotes from people who care.
Eliezer may be acting rationally. His ultimate purpose in building this site is to build support for his AI project. The only people on LessWrong, AFAIK, with decades of experience building AI systems, mapping beliefs and goals into formal statements, and then turning them on and seeing what happens, are you, me, and Ben Goertzel. Ben doesn’t care enough about Eliezer’s thoughts in particular to engage with them deeply; he wants to talk about generic futurist predictions such as near-term and far-term timelines. These discussions don’t deal in the complex, linguistic, representational, even philosophical problems at the core of Eliezer’s plan (though Ben is capable of dealing with them, they just don’t come up in discussions of AI fooms etc.), so even when he disagrees with Eliezer, Eliezer can quickly grasp his point. He is not a threat or a puzzle.
Whereas your comments are… very long, hard to follow, and often full of colorful or emotional statements that people here take as evidence of irrationality. You’re expecting people to work harder at understanding them than they’re going to. If you haven’t noticed, reputation counts for nothing here. For all their talk of Bayesianism, nobody is going to check your bio and say, “Hmm, he’s a professor of mathematics with 20 publications in artificial intellgence; maybe I should take his opinion as seriously as that of the high-school dropout who has no experience building AI systems.” And Eliezer has carefully indoctrinated himself against considering any such evidence.
So if you consider that the people most likely to find the flaws in Eliezer’s more-specific FAI & CEV plans are you and me, and that Eliezer has been public about calling both of us irrational people not worth talking with, this is consistent either with the hypothesis that his purpose is to discredit people who pose threats to his program, or with the hypothesis that his ego is too large to respond with anything other than dismissal to critiques that he can’t understand immediately or that trigger his “crackpot” patter-matcher, but not with the hypothesis that arguing with him will change his mind.
(I find the continual readiness of people to assume that Eliezer always speaks the truth odd, when he’s gone more out of his way than anyone I know, in both his blog posts and his fanfiction, to show that honest argumentation is not generally a winning strategy. He used to append a signature to his email along those lines, something about warning people not to assume that the obvious interpretation of what he said was the truth.)
RobbBB seems diplomatic, and I don’t think you should quit talking with him because Eliezer made you angry. That’s what Eliezer wants.
For all their talk of Bayesianism, nobody is going to check your bio and say, “Hmm, he’s a professor of mathematics with 20 publications in artificial intellgence; maybe I should take his opinion as seriously as that of the high-school dropout who has no experience building AI systems.”
Actually, that was the first thing I did, not sure about other people. What I saw was:
Teaches at what appears to be a small private liberal arts college, not a major school.
Out of 20 or so publications listed on http://www.richardloosemore.com/papers, a bunch are unrelated to AI, others are posters and interviews, or even “unpublished”, which are all low-confidence media.
Several contributions are entries in conference proceedings (are they peer-reviewed? I don’t know) .
A number are listed as “to appear”, and so impossible to evaluate.
A few are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.
One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.
I could not find any external references to RL’s work except through links to Ben Goertzel (IEET was one exception).
As a result, I was unable to independently evaluate RL’s expertise level, but clearly he is not at the top of the AI field, unlike say, Ben Goertzel. Given his poorly written posts and childish behavior here, indicative of an over-inflated ego, I have decided that whatever he writes can be safely ignored. I did not think of him as a crackpot, more like a noise maker.
Admittedly, I am not sold on Eliezer’s ideas, either, since many other AI experts are skeptical of them, and that’s the only thing I can go by, not being an expert in the field myself. But at least Eliezer has done several impossible things in the last decade or so, which commands a lot of respect, while Richard appears to be drifting along.
As a result, I was unable to independently evaluate RL’s expertise level, but clearly he is not at the top of the AI field, unlike say, Ben Goertzel.
At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel’s status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.
At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel’s status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.
Is there some assumption here that association with Ben Goertzel should be considered evidence in favour of an individual’s credibility on AI? That seems backwards.
Goertzel is also known for approving of people who are uncontroversially cranks. See here. It’s also known, via his cooperation with MIRI, that a collaboration with him in no way implies his endorsement of another’s viewpoints.
Could you point the interested reader to your critique of his work?
Comments can likely be found on this site from years ago. I don’t recall anything particularly in depth or memorable. It’s probably better to just look at things that Ben Goertzel says and making one’s own judgement. The thinking he expresses is not of the kind that impresses me but other’s mileage may vary.
I don’t begrudge anyone their right to their beauty contests but I do observe that whatever it is that is measured by identifying the degree of affiliation with Ben Goertzel is something wildly out of sync with the kind of thing I would consider evidence of credibility.
If only so I can cite them to Eliezer-is-a-crank people.
I advise against doing that. It is unlikely to change anyone’s mind.
By impossible feats I mean that a regular person would not be able to reproduce them, except by chance, like winning a lottery, starting Google, founding a successful religion or becoming a President.
He started as a high-school dropout without any formal education and look what he achieved so far, professionally and personally. Look at the organizations he founded and inspired. Look at the high-status experts in various fields (business, comp sci, programming, philosophy, math and physics) who take him seriously (some even give him loads of money). Heck, how many people manage to have multiple simultaneous long-term partners who are all highly intelligent and apparently get along well?
Basically this. As Eliezer himself points out, humans aren’t terribly rational on average and our judgements of each others’ rationality isn’t great either. Large amounts of support implies charisma, not intelligence.
TDT is closer to what I’m looking for, though it’s a … tad long.
I advise against doing that. It is unlikely to change anyone’s mind.
Point, but there’s also the middle ground “I’m not sure if he’s a crank or not, but I’m busy so I won’t look unless there’s some evidence he’s not.”
The big two I’ve come up with is a) he actually changes his mind about important things (though I need to find an actual post I can cite—didn’t he reopen the question of the possibility of a hard takeoff, or something?) and b) TDT.
Sure, but that’s hard to prove: given “Eliezer is a crank,” the probability of “Eliezer is lying about his AI-box prowess” is much higher than “Eliezer actually pulled that off.”
The latest success by a non-Eliezer person helps, but I’d still like something I can literally cite.
I don’t see why anyone would think that. Plenty of people in the anti-vaccination crowd managed to convince parents to mortally endanger their children.
Yes, but that’s really not that hard. For starters, you can do a better job of picking your targets.
The AI-box experiment often is run with intelligent, rational people with money on the line and an obvious right answer; it’s a whole lot more impossible than picking the right uneducated family to sell your snake oil to.
Ohh, come on. Cyclical reasoning here. You think Yudkowsky is not a crank, so you think the folks that play that silly game with him are intelligent and rational (by the way a plenty of people who get duped by anti-vaxxers are of above average IQ), and so you get more evidence that Yudkowsky is not a crank. Cyclical reasoning doesn’t persuade anyone who isn’t already a believer.
You need non-cyclical reasoning. Which would generally be something where you aren’t the one having to explain people that the achievement in question is profound.
You need non-cyclical reasoning. Which would generally be something where you aren’t the one having to explain people that the achievement in question is profound.
This bit confuses me.
That aside:
You think Yudkowsky is not a crank, so you think the folks that play that silly game with him are intelligent and rational
Non sequitur. From the posts they make, everyone on this site seems to me to be sufficiently intelligent as to make “selling snake oil” impossible, in a cut-and-dry case like the AI box. Yudowsky’s own credibility doesn’t enter into it.
From the posts they make, everyone on this site seems to me to be sufficiently intelligent as to make “selling snake oil” impossible, in a cut-and-dry case like the AI box.
So what do you think even happened, anyway, if you think the obvious explanation is impossible?
Originally, you were hypothesising that the problem with persuading the others would be the possibility that Yudkowsky lied about AI box powers. I pointed out the possibility that this experiment is far less profound than you think it is. (Albeit frankly I do not know why you think it is so profound).
Ah, sorry. This brand of impossible.
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place. I don’t think IQ is so strongly protective against deception, for example, and I do not think that you can assess something based on how the postings look to you with sufficient reliability as to overcome Gaussian priors very far from the mean.
edit: example. I would deem it quite unlikely that Yudkowsky could, for example, score highly on a programming contest with competent participants or in any other conventional, validated, reliable metric of technical expertise and ability, under good contest rules (i.e. excluding the possibility of externals assistance). So if he did something like that, I’d be quite surprised, and lower the confidence in what ever models deemed that impossible; good old Bayes. I’m far more confident in the validity of those conventional metrics (and in lack of alternate modes of passing, such as persuasion) than in my assessment so my assessment would change the most. Meanwhile, when it’s some unconventional game, well, even if I thought that this game is difficult, I’d be much less confident in the reasoning “it looks hard so it must be hard” than the low prior of exceptional performance is low.
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place. I don’t think IQ is so strongly protective against deception, for example, and I do not think that you can assess something based on how the postings look to you with sufficient reliability as to overcome Gaussian priors very far from the mean.
Further, in this case the whole purpose of the experiment was to demonstrate that an AI could “take over a gatekeeper’s mind through a text channel” (something previously deemed “impossible”). As far as that goes it was, in my view, successful.
It’s clearly possible for some values of “gatekeeper”, since some people fall for 419 scams. The test is a bit meaningless without information about the gatekeepers
Originally, you were hypothesising that the problem with persuading the others would be the possibility that Yudkowsky lied about AI box powers. I pointed out the possibility that this experiment is far less profound than you think it is. (Albeit frankly I do not know why you think it is so profound).
Still have no idea what you’re talking about. What I originally said was: “the people who talk to Yudkowsky are intelligent” does not follow from “Yudkowsky is not a crank”; I independently judge those people to be intelligent.
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place.
“Impossible,” here, is used in the sense that “I have no idea where to start thinking about where to start thinking about how to do this.” It is clearly not actually impossible because it’s been done, twice.
I thought your “impossible” at least implied “improbable” under some sort of model.
edit: and as of having no idea, you just need to know the shared religious-ish context. Which these folks generally keep hidden from a causal observer.
Impossible is being used as a statement of difficulty. Someone who has “done the impossible” has obviously not actually done something impossible, merely done something that I have no idea where to start trying.
Seeing that “it is possible to do” doesn’t seem like it would have much effect on my assessment of how difficult it is, after the first. It certainly doesn’t have match effect on “It is very-very-difficult-impossible for linkhyrule5 to do such a thing.”
and as of having no idea, you just need to know the shared religious-ish context. Which these folks generally keep hidden from a causal observer.
What?
First, I’m pretty sure you mean “casual.” Second, I’m hardly a casual observer, though I haven’t read everything either. Third, most religions don’t let their leading figures (or much of anyone, really) change their minds on important things...
Some folks on this site have accidentally bought unintentional snake oil in The Big Hoo Hah That Shall not Be Mentioned. Only an intelligent person could have bought that particular puppy,
My point is, there is a certain level of general competence after which I would expect convincing someone with an OOC motive to let an IC AI out to be “impossible,” as defined below.
Results. Undervaccinated children tended to be black, to have a younger mother who was not married and did not have a college degree, to live in a household near the poverty level, and to live in a central city. Unvaccinated children tended to be white, to have a mother who was married and had a college degree, to live in a household with an annual income exceeding $75 000, and to have parents who expressed concerns regarding the safety of vaccines and indicated that medical doctors have little influence over vaccination decisions for their children.
And in any case the point is that any correlation between IQ and not being prone to getting duped like this is not perfect enough to deem anything particularly unlikely.
Hmm. Yeah, that’s hardly conclusive, but I think I was actually failing to update there. Now that you mention it, I seem to recall that both conspiracy theorists and cult victims skew toward higher IQ. I was clearly quite overconfident there.
And in any case the point is that any correlation between IQ and not being prone to getting duped like this is not perfect enough to deem anything particularly unlikely.
Wasn’t the point that
intelligent, rational people with money on the line and an obvious right answer
wasn’t enough, actually? That seems like a much stronger claim than “it’s really hard to fool high-IQ people”.
I imagine that says more about the demographics of the general New Age belief cluster than it does about any special IQ-based appeal of vaccination skepticism.
There probably are some scams or virulent memes that prey on insecurities strongly correlated with high IQ, though. I can’t think of anything specific offhand, but the fringes of geek culture are probably one of the better places to start looking.
Well, the way I see it, outside of very high IQ in combination with education that is multiple topics of biochemistry, effects of intelligence are small and are easily dwarfed by things like those demographical correlations.
There probably are some scams or virulent memes that prey on insecurities specific to high-IQ people, though. I can’t think of anything specific offhand
Free energy scams. Hydrinos, cold fusion, magnetic generators, perpetual motion, you name it. edit: or in the medicine, counter intuitive stuff like sitting in an old uranium mine inhaling radon, then having so much radon progeny plate-out it sets nuclear material smuggling alarms off. Naturalistic fallacy stuff in general.
That is more persuasive to high IQ people, but, I think, only insofar as intelligence allows one to gain better rationality skills. And if we’re including that, there are plenty of other, facetious examples that come into play.
Also: ha ha. How hilarious. I would love to see why you class cryonics as a scam, but sadly I’m fairly certain it would be one of the standard mistakes.
I was in a rush last night, shminux, so I didn’t have time for a couple of other quick clarifications:
First, you say “One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.”
Well, H+ magazine is one of the foremost online magazines (perhaps THE foremost online magazine) of the transhumanist community.
And, you mention Springer. You did not notice that one of my papers was in the recently published Springer book “Singularity Hypotheses”.
Second, you say “A few [of my papers] are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.”
Actually they were about dysgraphia, not dyslexia … but more importantly, those papers were about computational models of language processing. In particular they were very, VERY simple versions of the computational model of human language that is one of my special areas of expertise. And since that model is primarily about learning mechanisms (the language domain is only a testbed for a research programme whose main focus is learning), those papers you saw were actually indicative that back in the early 1990s I was already working on the construction of the core aspects of an AI system.
So, saying “dyslexia” gives a very misleading impression of what that was all about. :-)
You are quite selective in your catalog of my achievements....
One item was a chapter in a book entitled “Theoretical Foundations of Artificial General Intelligence”. Sure, it was about the consciousness question, but still.
You make a casual disparaging remark about the college where I currently work … but forget to mention that I graduated from an institution that is ranked in the top 3 or 4 in the world (University College London).
You neglect to mention that I have academic qualifications in multiple fields—both physics and artificial intelligence/cognitive psychology. I now teach in both of those fields.
And in addition to all of the above, you did not notice that I am (in addition to my teaching duties) an AI developer who works on his projects WITHOUT intending to publish that work all the time! My AI work is largely proprietary. What you see from the outside are the occasional spinoffs and side projects that get turned into published writings. Not to be too coy, but isn’t that something you would expect from someone who is actually walking the walk....? :-)
There are a number of comments from other people below about Ben Goertzel, some of them a little strange. I wrote a paper a couple of years ago that Ben suggested we get together to and publish… that is now a chapter in the book “Singularity Hypotheses”.
So clearly Ben Goertzel (who has a large, well-funded AGI lab) is not of the opinion that I am a crank. Could I get one point for that?
Phil Goetz, who is an experienced veteran of the AGI field, has on this thread made a comment to the effect that he thinks that Ben Goertzel, himself, and myself are the three people Eliezer should be seriously listening to (since the three of us are among the few people who have been working on this problem for many years, and who have active AGI projects). So perhaps that is two points? Maybe?
And, just out of curiosity, I would invite you to check in with the guy who invented AIXI—Marcus Hutter. He and I met and had a very long discussion at the 2009 AGI conference. Marcus and I disagree substantially about the theoretical foundations of AI, but in spite of that disagreement I would urge you to ask him if he considers me to be down at the crank level. I might be wrong, but I do not think he would be willing to give me a bad reference. Let me know how that goes, yes?
You also finished off with what I can only describe as one of the most bizarre comparisons I have ever seen. :-) You say “Eliezer has done several impossible things in the last decade or so”. Hmmmm....! :-) And yet … “Richard appears to be drifting along” Well, okay, if you say so …. :-)
I have no horse in this race, and I am not an ardent EY supporter, or even count myself as a “rationalist”. In the area where I consider myself reasonably well trained, physics, he and I clashed a number of times on this forum. However, I am not an expert in the AI field, so I can only go by the outward signs of expertise. Ben Goertzel has them, Marcus Hutter has them, Eliezer has them. Richard Loosemore—not so much. For all I know, you might be the genius who invents the AGI and sets it loose someday, but it’s not obvious by looking online. And your histrionic comments and oversized ego make it appear rather unlikely.
I didn’t quit with Rob, btw. Ihave had a fairly productive—albeit exhausting—discussion with Rob over on his blog. I consider it to be productive because I have managed to narrow in on what he thinks is the central issue. And I think I have now (today’s comment, which is probably the last of the discussion) managed to nail down my own argument in a way that withstands all the attacks against it.
You are right that I have some serious debating weaknesses. I write too dense, and I assume that people have my width and breadth of experience, which is unfair (I got lucky in my career choices).
Oh, and don’t get me wrong: Eliezer never made me angry in this little episode. I laughed myself silly. Yeah, I protested. But I was wiping back tears of laughter while I did. “Known Permanent Idiot” is just a wondeful turn of phrase. Thanks, Eliezer!
Anyway, I went and read the the majority of that discussion (well, the parts between Richard and Rob). Here’s my summary:
Richard:
I think that what is happening in this discussion [...] is a misunderstanding. [...]
[Rob responds]
Richard:
You completely miss the point that I was trying to make. [...]
[Rob responds]
Richard:
You are talking around the issue I raised. [...] There is a gigantic elephant in the middle of this room, but your back is turned to it. [...]
[Rob responds]
Richard:
[...] But each time I explain my real complaint, you ignore it and respond as if I did not say anything about that issue. Can you address my particular complaint, and not that other distraction?
[Rob responds]
Richard:
[...] So far, nobody (neither Rob nor anyone else at LW or elsewhere) will actually answer that question. [...]
[Rob responds]
Richard:
Once again, I am staggered and astonished by the resilience with which you avoid talking about the core issue, and instead return to the red herring that I keep trying to steer you away from. [...]
Rob:
Alright. You say I’ve been dancing around your “core” point. I think I’ve addressed your concerns quite directly, [...] To prevent yet another suggestion that I haven’t addressed the “core”, I’ll respond to everything you wrote above. [...]
Richard:
Rob, it happened again. [...]
I snipped a lot of things there. I found lots of other points I wanted to emphasize, and plenty of things I wanted to argue against. But those aren’t the point.
Richard, this next part is directed at you.
You know what I didn’t find?
I didn’t find any posts where you made a particular effort to address the core of Rob’s argument. It was always about your argument. Rob was always the one missing the point.
Sure, it took Rob long enough to focus on finding the core of your position, but he got there eventually. And what happened next? You declared that he was still missing the point, posted a condensed version of the same argument, and posted here that your position “withstands all the attacks against it.”
You didn’t even wait for him to respond. You certainly didn’t quote him and respond to the things he said. You gave no obvious indication that you were taking his arguments seriously.
As far as I’m concerned, this is a cardinal sin.
I think I am explaining the point with such long explanations that I am causing you to miss the point.
How about this alternate hypothesis? Your explanations are fine.
Rob understands what you’re saying.
He just doesn’t agree.
Perhaps you need to take a break from repeating yourself and make sure you understand Rob’s argument.
(P.S. Eliezer’s ad hominem is still wrong. You may be making a mistake, but I’m confident you can fix it, the tone of this post notwithstanding.)
This entire debate is supposed to about my argument, as presented in the original article I published on the IEET.org website (“The Fallacy of Dumb Superintelligence”).
But in that case, what should I do when Rob insists on talking about something that I did not say in that article?
My strategy was to explain his mistake, but not engage in a debate about his red herring. Sensible people of all stripes would consider that a mature response.
But over and over again Rob avoided the actual argument and insisted on talking about his red herring.
And then FINALLY I realized that I could write down my original claim in such a way that it is IMPOSSIBLE for Rob to misinterpret it.
(That was easy, in retrospect: all I had to do was remove the language that he was using as the jumping-off point for his red herring).
That final, succinct statement of my argument is sitting there at the end of his blog ….. so far ignored by you, and by him. Perhaps he will be able to respond, I don’t know, but you say you have read it, so you have had a chance to actually understand why it is that he has been talking about something of no relevance to my original argument.
But you, in your wisdom, chose to (a) completely ignore that statement of my argument, and (b) give me a patronizing rebuke for not being able to understand Rob’s red herring argument.
Further to RobbBB’s highly useful reply (you should read it), I would like to point out that the phenomenon of optimization processes having unexpected or unusual behaviour when placed in new situations or given new capabilities is quite general. It applies to pretty much any strong optimization process working toward any goal.
For example, you program an AI/robot with no resources to fetch a banana as quickly as possible in some sandbox environment, where the banana is guarded by a few people. The AI will describe its strategy as talking to the guards and convincing them to give it the banana. Or perhaps buying the banana off them if you give it some money. But there are many ways to change the situation:
Make the AI much smarter—it will realise that by taking a certain sequence of steps it can get from here to the banana without ever moving within someone’s field of view. No need to pay for it. (1.5—Make the AI even smarter, and it will realise that by painting a Basilisk on the robot’s front it can disable all the guards effortlessly and take a straight line to the banana.)
Give the AI a teleportation device—it teleports straight to the banana.
Give the AI a plasma cannon—it shoots all the guards dead and runs up and takes the banana.
The point is… it’s not at all weird for AI behaviour to be “inconsistent”. It isn’t a sign of anything being broken, in fact the goal is being achieved. The AI is just able to think of more effective ways to do it then you are. That is, after all, the point of superintelligence. And an AI that does this is not broken or stupid, and is certainly capable of being dangerous.
By the way, you can try to do something like this:
[ And by the way: one important feature that is OBVIOUSLY going to be in the goalX code is this: that the outcome of any actions that the goalX code prescribes, should always be checked to see if they are as consistent as possible with the verbal description of the class of results X, and if any inconsistency occurs the goalX code should be deemed defective, and be shut down for adjustment.]
But, to start with I have no idea how you would program this or what it means formally, but even if you could, it takes human judgement to identify “inconsistencies” that would matter to humans. Without embedding human values in there you’ll have the AI shut down every time it tries to do anything new, or use a stronger criterion of “inconsistency” and miss a few cases where the AI does something you actually don’t want.
Or, you know, the AI will deduce that the full “verbal description of the class of results X” (which is an infinite list) is of course defined by its goal (ie. the goalX code) and therefore reason that nothing the goalX code can do will be inconsistent with it.
I didn’t mean to ignore your argument; I just didn’t get around to it. As I said, there were a lot of things I wanted to respond to. (In fact, this post was going to be longer, but I decided to focus on your primary argument.)
Your story:
This hypothetical AI will say “I have a goal, and my goal is to get a certain class of results, X, in the real world.” [...] And we say “Hey, no problem: looks like your goal code is totally consistent with that verbal description of the desired class of results.” Everything is swell up to this point.
My version:
The AI is lying. Or possibly it isn’t very smart yet, so it’s bad at describing its goal. Or it’s oversimplifying, because the programmers told it to, because otherwise the goal description would take days. And the goal code itself is too complicated for the programmers to fully understand. In any case, everything is not swell.
Your story:
Then one day the AI says “Okay now, today my goalX code says I should do this…” and it describes an action that is VIOLENTLY inconsistent with the previously described class of results, X. This action violates every one of the features of the class that were previously given.
My version:
The AI’s goal was never really X. It was actually Z. The AI’s actions perfectly coincide with Z.
In the rest of the scenario you described, I agree that the AI’s behavior is pretty incoherent, if its goal is X. But if it’s really aiming for Z, then its behavior is perfectly, terrifyingly coherent.
And your “obvious” fail-safe isn’t going to help. The AI is smarter than us. If it wants Z, and a fail-safe prevents it from getting Z, it will find a way around that fail-safe.
I know, your premise is that X really is the AI’s true goal. But that’s my sticking point.
Making it actually have the goal X, before it starts self-modifying, is far from easy. You can’t just skip over that step and assume it as your premise.
What you say makes sense …. except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means …. the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the “fail-safe” is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where ”… it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe.” In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
You and I are both bound by the terms of a scenario that someone else has set here.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.
That all depends on the approach… if you have some big human-inspired but more brainy neural network that learns to be a person, it can well just do the right thing by itself, and the risks are in any case quite comparable to that with having a human do it.
If you are thinking of a “neat AI” with utility functions over world models and such, parts of said AI can maximize abstract metrics over mathematical models (including self improvement) without any “generally intelligent” process of eating you. So you would want to use those to build models of human meaning and intent.
Furthermore with regards to AI following some goals, it seems to me that goal specifications would have to be intelligently processed in the first place so that they could be actually applied to the real world—we can’t even define paperclips otherwise.
On one hand, Friendly AI people want to convert “make me happy” to a formal specification. Doing that has many potential pitfalls. because it is a formal specification.
On the other hand, Richard, I think, wants to simply tell the AI, in English, “Make me happy.” Given that approach, he makes the reasonable point that any AI smart enough to be dangerous would also be smart enough to interpret that at least as intelligently as a human would.
I think the important question here is, Which approach is better? LW always assumes the first, formal approach.
To be more specific (and Bayesian): Which approach gives a higher expected value? Formal specification is compatible with Eliezer’s ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
“Tell the AI in English” can fail, but the worst case is closer to a “With Folded Hands” scenario than to paperclips.
I’ve never considered the “Tell the AI what to do in English” approach before, but on first inspection it seems safer to me.
I considered these three options above:
C. direct normativity—program the AI to value what we value.
B. indirect normativity—program the AI to value figuring out what our values are and then valuing those things.
A. indirect indirect normativity—program the AI to value doing whatever we tell it to, and then tell it, in English, “Value figuring out what our values are and then valuing those things.”
I can see why you might consider A superior to C. I’m having a harder time seeing how A could be superior to B. I’m not sure why you say “Doing that has many potential pitfalls. because it is a formal specification.” (Suppose we could make an artificial superintelligence that thinks ‘informally’. What specifically would this improve, safety-wise?)
Regardless, the AI thinks in math. If you tell it to interpret your phonemes, rather than coding your meaning into its brain yourself, that doesn’t mean you’ll get an informal representation. You’ll just get a formal one that’s reconstructed by the AI itself.
It’s not clear to me that programming a seed to understand our commands (and then commanding it to become Friendlier) is easier than just programming it to become Friendlier, but in any case the processes are the same after the first stage. That is, A is the same as B but with a little extra added to the beginning, and it’s not clear to me why that little extra language-use stage is supposed to add any safety. Why wouldn’t it just add one more stage at which something can go wrong?
It is misleading to say that an interpreted language is formal because the C compiler is formal. Existence proof: Human language. I presume you think the hardware that runs the human mind has a formal specification. That hardware runs the interpreter of human language. You could argue that English therefore is formal, and indeed it is, in exactly the sense that biology is formal because of physics: technically true, but misleading.
This will boil down to a semantic argument about what “formal” means. Now, I don’t think that human minds—or computer programs—are “formal”. A formal process is not Turing complete. Formalization means modeling a process so that you can predict or place bounds on its results without actually simulating it. That’s what we mean by formal in practice. Formal systems are systems in which you can construct proofs. Turing-complete systems are ones where some things cannot be proven. If somebody talks about “formal methods” of programming, they don’t mean programming with a language that has a formal definition. They mean programming in a way that lets you provably verify certain things about the program without running the program. The halting problem implies that for a programming language to allow you to verify even that the program will terminate, your language may no longer be Turing-complete.
Eliezer’s approach to FAI is inherently formal in this sense, because he wants to be able to prove that an AI will or will not do certain things. That means he can’t avail himself of the full computational complexity of whatever language he’s programming in.
But I’m digressing from the more-important distinction, which is one of degree and of connotation. The words “formal system” always go along with computational systems that are extremely brittle, and that usually collapse completely with the introduction of a single mistake, such as a resolution theorem prover that can prove any falsehood if given one false belief. You may be able to argue your way around the semantics of “formal” to say this is not necessarily the case, but as a general principle, when designing a representational or computational system, fault-tolerance and robustness to noise are at odds with the simplicity of design and small number of interactions that make proving things easy and useful.
That all makes sense, but I’m missing the link between the above understanding of ‘formal’ and these four claims, if they’re what you were trying to say before:
(1) Indirect indirect normativity is less formal, in the relevant sense, than indirect normativity. I.e., because we’re incorporating more of human natural language into the AI’s decision-making, the reasoning system will be more tolerant of local errors, uncertainty, and noise.
(2) Programming an AI to value humans’ True Preferences in general (indirect normativity) has many pitfalls that programming an AI to value humans’ instructions’ True Meanings in general (indirect indirect normativity) doesn’t, because the former is more formal.
(3) “‘Tell the AI in English’ can fail, but the worst case is closer to a ‘With Folded Hands’ scenario than to paperclips.”
(4) The “With Folded Hands”-style scenario I have in mind is not as terrible as the paperclips scenario.
Wouldn’t this only be correct if similar hardware ran the software the same way? Human thinking is highly associative and variable, and as language is shared amongst many humans, it means that it doesn’t, as such, have a fixed formal representation.
Phil,
You are a rational and reasonable person. Why not speak up about what is happening here? Rob is making a spirited defense of his essay, over on his blog, and I have just posted a detailed critique that really nails down the core of the argument that is supposed to be happening here.
And yet, if you look closely you will find that all of my comments—be they as neutral, as sensible or as rational as they can be—are receiving negative votes so fast that they are disappearing to the bottom of the stack or being suppressed completely.
What a bizarre situation!! This article that RobbBB submitted to LessWrong is supposed to be ABOUT my own article on the IEET website. My article is the actual TOPIC here! And yet I, the author of that article, have been insulted here by Eliezer Yudkowsky, and my comments suppressed. Amazing, don’t you think?
Richard: On LessWrong, comments are sorted by how many thumbs up and thumbs down they get, because it makes it easier to find the most popular posts quickly. If a post gets −4 points or lower, it gets compressed to make room for more popular posts, and to discourage flame wars. (You can still un-compress it by just clicking the + in the upper right corner of the comment.) At the moment, some of Eliezer’s comments and yours have both been down-voted and compressed in this way, presumably because people on the site thought the personal attacks weren’t useful for the conversation as a whole.
People are probably also down-voting your comments because they’re histrionic and don’t reflect an understanding of this forum’s mechanics. I recommend only making points about the substance of people’s arguments; if you have personal complaints, take it to a private channel so it doesn’t add to the noise surrounding the arguments themselves.
Relatedly, Phil: You above described yourself and Richard Loosemore as “the two people (Eliezer) should listen to most”. Loosemore and I are having a discussion here. Does the content of that discussion affect your view of Richard’s level of insight into the problem of Friendly Artificial Intelligence?
Yeah, so: Phil Goetz.
I don’t think that’s how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.
Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don’t believe his position says that a near-Friendly-Complete specification is very risky—after all, a “smart” AGI would know what you really meant—but that such a specification would be superfluous.
Whether Mr. Loosemore is correct isn’t cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
I don’t think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn’t been either. Both are addressing intentionally friendly or neutral AI that goes wrong.
Wouldn’t it care about getting things right?
I think it’s a question of what you program in, and what you let it figure out for itself. If you want to prove formally that it will behave in certain ways, you would like to program in explicitly, formally, what its goals mean. But I think that “human pleasure” is such a complicated idea that trying to program it in formally is asking for disaster. That’s one of the things that you should definitely let the AI figure out for itself. Richard is saying that an AI as smart as a smart person would never conclude that human pleasure equals brain dopamine levels.
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
Huh I thought he wanted to use CEV?
You are right. I think PhilGoetz must be confused. EY has at least certainly never suggested programming an AI to maximise human pleasure.
“Tell the AI in English” is in essence an utility function “Maximize the value of X, where X is my current opinion of what some english text Y means”.
The ‘understanding English’ module, the mapping function between X and “what you told in English” is completely arbitrary, but is very important to the AI—so any self-modifying AI will want to modify and improve that. Also, we don’t have a good “understanding English” module so yes, we also want the AI to be able to modify and improve that. But, it can be wildly different from reality or opinions of humans—there are trivial ways of how well-meaning dialogue systems can misunderstand statements.
However, for the AI “improve the module” means “change the module so that my utility grows”—so in your example it has strong motivation to intentionally misunderstand English. The best case scenario is to misunderstand “Make everyone happy” as “Set your utility function to MAXINT”. The worst case scenario is, well, everything else.
There’s the classic quote “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”—if the AI doesn’t care in the first place, then “Tell AI what to do in English” won’t make it care.
By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT. You don’t need to speak to it in English for that—if you asked the AI to maximize paperclips, that is the equivalent of “Maximize the value of X, where X is my current opinion of how many paperclips there are”, and it would modify its paperclip-counting module to always return MAXINT.
You are correct that telling the AI to do Y is equivalent to “maximize the value of X, where X is my current opinion about Y”. However, “current” really means “current”, not “new”. If the AI is actually trying to obey the command to do Y, it won’t change its utility function unless having a new utility function will increase its utility according to its current utility function. Neither misunderstanding nor understanding will raise its utility unless its current utility function values having a utility function that misunderstands or understands.
That’s allegedly more or less what happened to Eurisko (here, section 2), although it didn’t trick itself quite that cleanly. The problem was only solved by algorithmically walling off its utility function from self-modification: an option that wouldn’t work for sufficiently strong AI, and one to avoid if you want to eventually allow your AI the capacity for a more precise notion of utility than you can give it.
Paperclipping as the term’s used here assumes value stability.
A human is a counterexample. A human emulation would count as an AI, so human behavior is one possible AI behavior. Richard’s argument is that humans don’t respond to orders or requests in anything like these brittle, GOFAI-type systems invoked by the word “formal systems”. You’re not considering that possibility. You’re still thinking in terms of formal systems.
(Unpacking the significant differences between how humans operate, and the default assumptions that the LW community makes about AI, would take… well, five years, maybe ten.)
Uhh, no. Look, humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants. Not because we’re some kind of “informal system”. Any computer program is a formal system, but there are simply more and less complex ones. All you are suggesting is building a very complex (“informal”) system and hoping that because it’s complex (like humans!) it will behave in a humanish way.
Your response avoids the basic logic here. A human emulation would count as an AI, therefore human behavior is one possible AI behavior. There is nothing controversial in the statement; the conclusion is drawn from the premise. If you don’t think a human emulation would count as AI, or isn’t possible, or something else, fine, but… why wouldn’t a human emulation count as an AI? How, for example, can we even think about advanced intelligence, much less attempt to model it, without considering human intelligence?
I don’t think this is generally an accurate (or complex) description of human behavior, but it does sound to me like an “informal system”—i.e. we tend to care. My reading of (at least this part of) PhilGoetz’s position is that it makes more sense to imagine something we would call an advanced or super AI responding to requests and commands with a certain nuance of understanding (as humans do) than with the inflexible (“brittle”) formality of, say, your average BASIC program.
The thing is, humans do that by… well, not being formal systems. Which pretty much requires you to keep a good fraction of the foibles and flaws of a nonformal, nonrigorously rational system.
You’d be more likely to get FAI, but FAI itself would be devalued, since now it’s possible for the FAI itself to make rationality errors.
More likely, really?
You’re essentially proposing giving a human Ultimate Power. I doubt that will go well.
Iunno. Humans are probably less likely to go horrifically insane with power than the base chance of FAI.
Your chances aren’t good, just better.
Phil, Unfortunately you are commenting without (seemingly) checking the original article of mine that RobbBB is discussing here. So, you say “On the other hand, Richard, I think, wants to simply tell the AI, in English, “Make me happy.” ”. In fact, I am not at all saying that. :-)
My article was discussing someone else’s claims about AI, and dissecting their claims. So I was not making any assertions of my own about the motivation system.
Aside: You will also note that I was having a productive conversation with RobbBB about his piece, when Yudkowsky decided to intervene with some gratuitous personal slander directed at me (see above). That discussion is now at an end.
I’m afraid reading all that and giving a full response to either you or RobbBB isn’t possible in the time I have available this weekend.
I agree that Eliezer is acting like a spoiled child, but calling people on their irrational interpersonal behavior within less wrong doesn’t work. Calling them on mistakes they make about mathematics is fine, but calling them on how they treat others on less wrong will attract more reflexive down-votes from people who think you’re contaminating their forum with emotion, than upvotes from people who care.
Eliezer may be acting rationally. His ultimate purpose in building this site is to build support for his AI project. The only people on LessWrong, AFAIK, with decades of experience building AI systems, mapping beliefs and goals into formal statements, and then turning them on and seeing what happens, are you, me, and Ben Goertzel. Ben doesn’t care enough about Eliezer’s thoughts in particular to engage with them deeply; he wants to talk about generic futurist predictions such as near-term and far-term timelines. These discussions don’t deal in the complex, linguistic, representational, even philosophical problems at the core of Eliezer’s plan (though Ben is capable of dealing with them, they just don’t come up in discussions of AI fooms etc.), so even when he disagrees with Eliezer, Eliezer can quickly grasp his point. He is not a threat or a puzzle.
Whereas your comments are… very long, hard to follow, and often full of colorful or emotional statements that people here take as evidence of irrationality. You’re expecting people to work harder at understanding them than they’re going to. If you haven’t noticed, reputation counts for nothing here. For all their talk of Bayesianism, nobody is going to check your bio and say, “Hmm, he’s a professor of mathematics with 20 publications in artificial intellgence; maybe I should take his opinion as seriously as that of the high-school dropout who has no experience building AI systems.” And Eliezer has carefully indoctrinated himself against considering any such evidence.
So if you consider that the people most likely to find the flaws in Eliezer’s more-specific FAI & CEV plans are you and me, and that Eliezer has been public about calling both of us irrational people not worth talking with, this is consistent either with the hypothesis that his purpose is to discredit people who pose threats to his program, or with the hypothesis that his ego is too large to respond with anything other than dismissal to critiques that he can’t understand immediately or that trigger his “crackpot” patter-matcher, but not with the hypothesis that arguing with him will change his mind.
(I find the continual readiness of people to assume that Eliezer always speaks the truth odd, when he’s gone more out of his way than anyone I know, in both his blog posts and his fanfiction, to show that honest argumentation is not generally a winning strategy. He used to append a signature to his email along those lines, something about warning people not to assume that the obvious interpretation of what he said was the truth.)
RobbBB seems diplomatic, and I don’t think you should quit talking with him because Eliezer made you angry. That’s what Eliezer wants.
Actually, that was the first thing I did, not sure about other people. What I saw was:
Teaches at what appears to be a small private liberal arts college, not a major school.
Out of 20 or so publications listed on http://www.richardloosemore.com/papers, a bunch are unrelated to AI, others are posters and interviews, or even “unpublished”, which are all low-confidence media.
Several contributions are entries in conference proceedings (are they peer-reviewed? I don’t know) .
A number are listed as “to appear”, and so impossible to evaluate.
A few are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.
One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.
I could not find any external references to RL’s work except through links to Ben Goertzel (IEET was one exception).
As a result, I was unable to independently evaluate RL’s expertise level, but clearly he is not at the top of the AI field, unlike say, Ben Goertzel. Given his poorly written posts and childish behavior here, indicative of an over-inflated ego, I have decided that whatever he writes can be safely ignored. I did not think of him as a crackpot, more like a noise maker.
Admittedly, I am not sold on Eliezer’s ideas, either, since many other AI experts are skeptical of them, and that’s the only thing I can go by, not being an expert in the field myself. But at least Eliezer has done several impossible things in the last decade or so, which commands a lot of respect, while Richard appears to be drifting along.
At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel’s status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.
Is there some assumption here that association with Ben Goertzel should be considered evidence in favour of an individual’s credibility on AI? That seems backwards.
Well, it does show that Goertzel respects his opinions at least enough to be willing to author a paper with him.
Goertzel appears to be a respected figuer in the field. Could you point the interested reader to your critique of his work?
Goertzel is also known for approving of people who are uncontroversially cranks. See here. It’s also known, via his cooperation with MIRI, that a collaboration with him in no way implies his endorsement of another’s viewpoints.
Comments can likely be found on this site from years ago. I don’t recall anything particularly in depth or memorable. It’s probably better to just look at things that Ben Goertzel says and making one’s own judgement. The thinking he expresses is not of the kind that impresses me but other’s mileage may vary.
I don’t begrudge anyone their right to their beauty contests but I do observe that whatever it is that is measured by identifying the degree of affiliation with Ben Goertzel is something wildly out of sync with the kind of thing I would consider evidence of credibility.
In CS, conference papers are generally higher status & quality than journal articles.
Name three? If only so I can cite them to Eliezer-is-a-crank people.
I advise against doing that. It is unlikely to change anyone’s mind.
By impossible feats I mean that a regular person would not be able to reproduce them, except by chance, like winning a lottery, starting Google, founding a successful religion or becoming a President.
He started as a high-school dropout without any formal education and look what he achieved so far, professionally and personally. Look at the organizations he founded and inspired. Look at the high-status experts in various fields (business, comp sci, programming, philosophy, math and physics) who take him seriously (some even give him loads of money). Heck, how many people manage to have multiple simultaneous long-term partners who are all highly intelligent and apparently get along well?
He’s achieved about what Ayn Rand achieved, and almost everyone thinks she wasa crank.
Basically this. As Eliezer himself points out, humans aren’t terribly rational on average and our judgements of each others’ rationality isn’t great either. Large amounts of support implies charisma, not intelligence.
TDT is closer to what I’m looking for, though it’s a … tad long.
Point, but there’s also the middle ground “I’m not sure if he’s a crank or not, but I’m busy so I won’t look unless there’s some evidence he’s not.”
The big two I’ve come up with is a) he actually changes his mind about important things (though I need to find an actual post I can cite—didn’t he reopen the question of the possibility of a hard takeoff, or something?) and b) TDT.
Won some AI box experiments as the AI.
Sure, but that’s hard to prove: given “Eliezer is a crank,” the probability of “Eliezer is lying about his AI-box prowess” is much higher than “Eliezer actually pulled that off.”
The latest success by a non-Eliezer person helps, but I’d still like something I can literally cite.
I don’t see why anyone would think that. Plenty of people in the anti-vaccination crowd managed to convince parents to mortally endanger their children.
Yes, but that’s really not that hard. For starters, you can do a better job of picking your targets.
The AI-box experiment often is run with intelligent, rational people with money on the line and an obvious right answer; it’s a whole lot more impossible than picking the right uneducated family to sell your snake oil to.
Ohh, come on. Cyclical reasoning here. You think Yudkowsky is not a crank, so you think the folks that play that silly game with him are intelligent and rational (by the way a plenty of people who get duped by anti-vaxxers are of above average IQ), and so you get more evidence that Yudkowsky is not a crank. Cyclical reasoning doesn’t persuade anyone who isn’t already a believer.
You need non-cyclical reasoning. Which would generally be something where you aren’t the one having to explain people that the achievement in question is profound.
You probably mean “circular”.
This bit confuses me.
That aside:
Non sequitur. From the posts they make, everyone on this site seems to me to be sufficiently intelligent as to make “selling snake oil” impossible, in a cut-and-dry case like the AI box. Yudowsky’s own credibility doesn’t enter into it.
I thought you wanted to persuade others.
So what do you think even happened, anyway, if you think the obvious explanation is impossible?
Yes, but I don’t see why this is relevant
Ah, sorry. This brand of impossible.
Originally, you were hypothesising that the problem with persuading the others would be the possibility that Yudkowsky lied about AI box powers. I pointed out the possibility that this experiment is far less profound than you think it is. (Albeit frankly I do not know why you think it is so profound).
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place. I don’t think IQ is so strongly protective against deception, for example, and I do not think that you can assess something based on how the postings look to you with sufficient reliability as to overcome Gaussian priors very far from the mean.
edit: example. I would deem it quite unlikely that Yudkowsky could, for example, score highly on a programming contest with competent participants or in any other conventional, validated, reliable metric of technical expertise and ability, under good contest rules (i.e. excluding the possibility of externals assistance). So if he did something like that, I’d be quite surprised, and lower the confidence in what ever models deemed that impossible; good old Bayes. I’m far more confident in the validity of those conventional metrics (and in lack of alternate modes of passing, such as persuasion) than in my assessment so my assessment would change the most. Meanwhile, when it’s some unconventional game, well, even if I thought that this game is difficult, I’d be much less confident in the reasoning “it looks hard so it must be hard” than the low prior of exceptional performance is low.
Further, in this case the whole purpose of the experiment was to demonstrate that an AI could “take over a gatekeeper’s mind through a text channel” (something previously deemed “impossible”). As far as that goes it was, in my view, successful.
It’s clearly possible for some values of “gatekeeper”, since some people fall for 419 scams. The test is a bit meaningless without information about the gatekeepers
Still have no idea what you’re talking about. What I originally said was: “the people who talk to Yudkowsky are intelligent” does not follow from “Yudkowsky is not a crank”; I independently judge those people to be intelligent.
“Impossible,” here, is used in the sense that “I have no idea where to start thinking about where to start thinking about how to do this.” It is clearly not actually impossible because it’s been done, twice.
And point about the contest.
I thought your “impossible” at least implied “improbable” under some sort of model.
edit: and as of having no idea, you just need to know the shared religious-ish context. Which these folks generally keep hidden from a causal observer.
Impossible is being used as a statement of difficulty. Someone who has “done the impossible” has obviously not actually done something impossible, merely done something that I have no idea where to start trying.
Seeing that “it is possible to do” doesn’t seem like it would have much effect on my assessment of how difficult it is, after the first. It certainly doesn’t have match effect on “It is very-very-difficult-impossible for linkhyrule5 to do such a thing.”
What?
First, I’m pretty sure you mean “casual.” Second, I’m hardly a casual observer, though I haven’t read everything either. Third, most religions don’t let their leading figures (or much of anyone, really) change their minds on important things...
Some folks on this site have accidentally bought unintentional snake oil in The Big Hoo Hah That Shall not Be Mentioned. Only an intelligent person could have bought that particular puppy,
Granted. And it may be that additional knowledge/intelligence makes yourself more vulnerable a Gatekeeper.
Trying to think this out in terms of levels of smartness alone is very unlikely to be helpful.
Well yes. It is a factor, no more no less.
My point is, there is a certain level of general competence after which I would expect convincing someone with an OOC motive to let an IC AI out to be “impossible,” as defined below.
But less than half of them, I’ll wager. This is clearly an abuse of averages.
I wouldn’t wager too much money on that one. http://pediatrics.aappublications.org/content/114/1/187.abstract .
And in any case the point is that any correlation between IQ and not being prone to getting duped like this is not perfect enough to deem anything particularly unlikely.
Hmm. Yeah, that’s hardly conclusive, but I think I was actually failing to update there. Now that you mention it, I seem to recall that both conspiracy theorists and cult victims skew toward higher IQ. I was clearly quite overconfident there.
Wasn’t the point that
wasn’t enough, actually? That seems like a much stronger claim than “it’s really hard to fool high-IQ people”.
I imagine that says more about the demographics of the general New Age belief cluster than it does about any special IQ-based appeal of vaccination skepticism.
There probably are some scams or virulent memes that prey on insecurities strongly correlated with high IQ, though. I can’t think of anything specific offhand, but the fringes of geek culture are probably one of the better places to start looking.
Well, the way I see it, outside of very high IQ in combination with education that is multiple topics of biochemistry, effects of intelligence are small and are easily dwarfed by things like those demographical correlations.
Free energy scams. Hydrinos, cold fusion, magnetic generators, perpetual motion, you name it. edit: or in the medicine, counter intuitive stuff like sitting in an old uranium mine inhaling radon, then having so much radon progeny plate-out it sets nuclear material smuggling alarms off. Naturalistic fallacy stuff in general.
Cryonics. ducks and runs
Edit: It was a joke. Sorryyyyyy
That is more persuasive to high IQ people, but, I think, only insofar as intelligence allows one to gain better rationality skills. And if we’re including that, there are plenty of other, facetious examples that come into play.
Also: ha ha. How hilarious. I would love to see why you class cryonics as a scam, but sadly I’m fairly certain it would be one of the standard mistakes.
Also, maybe its a matter of semantics, but winning a game that you created isn’t really ‘doing the impossible’ in the sense I took the phrasing.
Winning a game you created… that sounds as impossible to win as that?
I was in a rush last night, shminux, so I didn’t have time for a couple of other quick clarifications:
First, you say “One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.”
Well, H+ magazine is one of the foremost online magazines (perhaps THE foremost online magazine) of the transhumanist community.
And, you mention Springer. You did not notice that one of my papers was in the recently published Springer book “Singularity Hypotheses”.
Second, you say “A few [of my papers] are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.”
Actually they were about dysgraphia, not dyslexia … but more importantly, those papers were about computational models of language processing. In particular they were very, VERY simple versions of the computational model of human language that is one of my special areas of expertise. And since that model is primarily about learning mechanisms (the language domain is only a testbed for a research programme whose main focus is learning), those papers you saw were actually indicative that back in the early 1990s I was already working on the construction of the core aspects of an AI system.
So, saying “dyslexia” gives a very misleading impression of what that was all about. :-)
That is a very interesting assessment, shminux.
Would you be up for some feedback?
You are quite selective in your catalog of my achievements....
One item was a chapter in a book entitled “Theoretical Foundations of Artificial General Intelligence”. Sure, it was about the consciousness question, but still.
You make a casual disparaging remark about the college where I currently work … but forget to mention that I graduated from an institution that is ranked in the top 3 or 4 in the world (University College London).
You neglect to mention that I have academic qualifications in multiple fields—both physics and artificial intelligence/cognitive psychology. I now teach in both of those fields.
And in addition to all of the above, you did not notice that I am (in addition to my teaching duties) an AI developer who works on his projects WITHOUT intending to publish that work all the time! My AI work is largely proprietary. What you see from the outside are the occasional spinoffs and side projects that get turned into published writings. Not to be too coy, but isn’t that something you would expect from someone who is actually walking the walk....? :-)
There are a number of comments from other people below about Ben Goertzel, some of them a little strange. I wrote a paper a couple of years ago that Ben suggested we get together to and publish… that is now a chapter in the book “Singularity Hypotheses”.
So clearly Ben Goertzel (who has a large, well-funded AGI lab) is not of the opinion that I am a crank. Could I get one point for that?
Phil Goetz, who is an experienced veteran of the AGI field, has on this thread made a comment to the effect that he thinks that Ben Goertzel, himself, and myself are the three people Eliezer should be seriously listening to (since the three of us are among the few people who have been working on this problem for many years, and who have active AGI projects). So perhaps that is two points? Maybe?
And, just out of curiosity, I would invite you to check in with the guy who invented AIXI—Marcus Hutter. He and I met and had a very long discussion at the 2009 AGI conference. Marcus and I disagree substantially about the theoretical foundations of AI, but in spite of that disagreement I would urge you to ask him if he considers me to be down at the crank level. I might be wrong, but I do not think he would be willing to give me a bad reference. Let me know how that goes, yes?
You also finished off with what I can only describe as one of the most bizarre comparisons I have ever seen. :-) You say “Eliezer has done several impossible things in the last decade or so”. Hmmmm....! :-) And yet … “Richard appears to be drifting along” Well, okay, if you say so …. :-)
I have no horse in this race, and I am not an ardent EY supporter, or even count myself as a “rationalist”. In the area where I consider myself reasonably well trained, physics, he and I clashed a number of times on this forum. However, I am not an expert in the AI field, so I can only go by the outward signs of expertise. Ben Goertzel has them, Marcus Hutter has them, Eliezer has them. Richard Loosemore—not so much. For all I know, you might be the genius who invents the AGI and sets it loose someday, but it’s not obvious by looking online. And your histrionic comments and oversized ego make it appear rather unlikely.
I agree with pretty much all of the above.
I didn’t quit with Rob, btw. Ihave had a fairly productive—albeit exhausting—discussion with Rob over on his blog. I consider it to be productive because I have managed to narrow in on what he thinks is the central issue. And I think I have now (today’s comment, which is probably the last of the discussion) managed to nail down my own argument in a way that withstands all the attacks against it.
You are right that I have some serious debating weaknesses. I write too dense, and I assume that people have my width and breadth of experience, which is unfair (I got lucky in my career choices).
Oh, and don’t get me wrong: Eliezer never made me angry in this little episode. I laughed myself silly. Yeah, I protested. But I was wiping back tears of laughter while I did. “Known Permanent Idiot” is just a wondeful turn of phrase. Thanks, Eliezer!
Link to the nailed-down version of the argument?
Bottommost (September 9, 6:03 PM) comment here.
Oh, yeah, I found that myself eventually.
Anyway, I went and read the the majority of that discussion (well, the parts between Richard and Rob). Here’s my summary:
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
Rob:
Richard:
I snipped a lot of things there. I found lots of other points I wanted to emphasize, and plenty of things I wanted to argue against. But those aren’t the point.
Richard, this next part is directed at you.
You know what I didn’t find?
I didn’t find any posts where you made a particular effort to address the core of Rob’s argument. It was always about your argument. Rob was always the one missing the point.
Sure, it took Rob long enough to focus on finding the core of your position, but he got there eventually. And what happened next? You declared that he was still missing the point, posted a condensed version of the same argument, and posted here that your position “withstands all the attacks against it.”
You didn’t even wait for him to respond. You certainly didn’t quote him and respond to the things he said. You gave no obvious indication that you were taking his arguments seriously.
As far as I’m concerned, this is a cardinal sin.
How about this alternate hypothesis? Your explanations are fine. Rob understands what you’re saying. He just doesn’t agree.
Perhaps you need to take a break from repeating yourself and make sure you understand Rob’s argument.
(P.S. Eliezer’s ad hominem is still wrong. You may be making a mistake, but I’m confident you can fix it, the tone of this post notwithstanding.)
This entire debate is supposed to about my argument, as presented in the original article I published on the IEET.org website (“The Fallacy of Dumb Superintelligence”).
But in that case, what should I do when Rob insists on talking about something that I did not say in that article?
My strategy was to explain his mistake, but not engage in a debate about his red herring. Sensible people of all stripes would consider that a mature response.
But over and over again Rob avoided the actual argument and insisted on talking about his red herring.
And then FINALLY I realized that I could write down my original claim in such a way that it is IMPOSSIBLE for Rob to misinterpret it.
(That was easy, in retrospect: all I had to do was remove the language that he was using as the jumping-off point for his red herring).
That final, succinct statement of my argument is sitting there at the end of his blog ….. so far ignored by you, and by him. Perhaps he will be able to respond, I don’t know, but you say you have read it, so you have had a chance to actually understand why it is that he has been talking about something of no relevance to my original argument.
But you, in your wisdom, chose to (a) completely ignore that statement of my argument, and (b) give me a patronizing rebuke for not being able to understand Rob’s red herring argument.
Further to RobbBB’s highly useful reply (you should read it), I would like to point out that the phenomenon of optimization processes having unexpected or unusual behaviour when placed in new situations or given new capabilities is quite general. It applies to pretty much any strong optimization process working toward any goal.
For example, you program an AI/robot with no resources to fetch a banana as quickly as possible in some sandbox environment, where the banana is guarded by a few people. The AI will describe its strategy as talking to the guards and convincing them to give it the banana. Or perhaps buying the banana off them if you give it some money. But there are many ways to change the situation:
Make the AI much smarter—it will realise that by taking a certain sequence of steps it can get from here to the banana without ever moving within someone’s field of view. No need to pay for it. (1.5—Make the AI even smarter, and it will realise that by painting a Basilisk on the robot’s front it can disable all the guards effortlessly and take a straight line to the banana.)
Give the AI a teleportation device—it teleports straight to the banana.
Give the AI a plasma cannon—it shoots all the guards dead and runs up and takes the banana.
The point is… it’s not at all weird for AI behaviour to be “inconsistent”. It isn’t a sign of anything being broken, in fact the goal is being achieved. The AI is just able to think of more effective ways to do it then you are. That is, after all, the point of superintelligence. And an AI that does this is not broken or stupid, and is certainly capable of being dangerous.
By the way, you can try to do something like this:
But, to start with I have no idea how you would program this or what it means formally, but even if you could, it takes human judgement to identify “inconsistencies” that would matter to humans. Without embedding human values in there you’ll have the AI shut down every time it tries to do anything new, or use a stronger criterion of “inconsistency” and miss a few cases where the AI does something you actually don’t want.
Or, you know, the AI will deduce that the full “verbal description of the class of results X” (which is an infinite list) is of course defined by its goal (ie. the goalX code) and therefore reason that nothing the goalX code can do will be inconsistent with it.
I didn’t mean to ignore your argument; I just didn’t get around to it. As I said, there were a lot of things I wanted to respond to. (In fact, this post was going to be longer, but I decided to focus on your primary argument.)
Your story:
My version:
Your story:
My version:
In the rest of the scenario you described, I agree that the AI’s behavior is pretty incoherent, if its goal is X. But if it’s really aiming for Z, then its behavior is perfectly, terrifyingly coherent.
And your “obvious” fail-safe isn’t going to help. The AI is smarter than us. If it wants Z, and a fail-safe prevents it from getting Z, it will find a way around that fail-safe.
I know, your premise is that X really is the AI’s true goal. But that’s my sticking point.
Making it actually have the goal X, before it starts self-modifying, is far from easy. You can’t just skip over that step and assume it as your premise.
What you say makes sense …. except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means …. the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the “fail-safe” is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where ”… it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe.” In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
Hope that clarifies rather than obscures.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.
That all depends on the approach… if you have some big human-inspired but more brainy neural network that learns to be a person, it can well just do the right thing by itself, and the risks are in any case quite comparable to that with having a human do it.
If you are thinking of a “neat AI” with utility functions over world models and such, parts of said AI can maximize abstract metrics over mathematical models (including self improvement) without any “generally intelligent” process of eating you. So you would want to use those to build models of human meaning and intent.
Furthermore with regards to AI following some goals, it seems to me that goal specifications would have to be intelligently processed in the first place so that they could be actually applied to the real world—we can’t even define paperclips otherwise.