Phil, Unfortunately you are commenting without (seemingly) checking the original article of mine that RobbBB is discussing here. So, you say “On the other hand, Richard, I think, wants to simply tell the AI, in English, “Make me happy.” ”. In fact, I am not at all saying that. :-)
My article was discussing someone else’s claims about AI, and dissecting their claims. So I was not making any assertions of my own about the motivation system.
Aside: You will also note that I was having a productive conversation with RobbBB about his piece, when Yudkowsky decided to intervene with some gratuitous personal slander directed at me (see above). That discussion is now at an end.
I’m afraid reading all that and giving a full response to either you or RobbBB isn’t possible in the time I have available this weekend.
I agree that Eliezer is acting like a spoiled child, but calling people on their irrational interpersonal behavior within less wrong doesn’t work. Calling them on mistakes they make about mathematics is fine, but calling them on how they treat others on less wrong will attract more reflexive down-votes from people who think you’re contaminating their forum with emotion, than upvotes from people who care.
Eliezer may be acting rationally. His ultimate purpose in building this site is to build support for his AI project. The only people on LessWrong, AFAIK, with decades of experience building AI systems, mapping beliefs and goals into formal statements, and then turning them on and seeing what happens, are you, me, and Ben Goertzel. Ben doesn’t care enough about Eliezer’s thoughts in particular to engage with them deeply; he wants to talk about generic futurist predictions such as near-term and far-term timelines. These discussions don’t deal in the complex, linguistic, representational, even philosophical problems at the core of Eliezer’s plan (though Ben is capable of dealing with them, they just don’t come up in discussions of AI fooms etc.), so even when he disagrees with Eliezer, Eliezer can quickly grasp his point. He is not a threat or a puzzle.
Whereas your comments are… very long, hard to follow, and often full of colorful or emotional statements that people here take as evidence of irrationality. You’re expecting people to work harder at understanding them than they’re going to. If you haven’t noticed, reputation counts for nothing here. For all their talk of Bayesianism, nobody is going to check your bio and say, “Hmm, he’s a professor of mathematics with 20 publications in artificial intellgence; maybe I should take his opinion as seriously as that of the high-school dropout who has no experience building AI systems.” And Eliezer has carefully indoctrinated himself against considering any such evidence.
So if you consider that the people most likely to find the flaws in Eliezer’s more-specific FAI & CEV plans are you and me, and that Eliezer has been public about calling both of us irrational people not worth talking with, this is consistent either with the hypothesis that his purpose is to discredit people who pose threats to his program, or with the hypothesis that his ego is too large to respond with anything other than dismissal to critiques that he can’t understand immediately or that trigger his “crackpot” patter-matcher, but not with the hypothesis that arguing with him will change his mind.
(I find the continual readiness of people to assume that Eliezer always speaks the truth odd, when he’s gone more out of his way than anyone I know, in both his blog posts and his fanfiction, to show that honest argumentation is not generally a winning strategy. He used to append a signature to his email along those lines, something about warning people not to assume that the obvious interpretation of what he said was the truth.)
RobbBB seems diplomatic, and I don’t think you should quit talking with him because Eliezer made you angry. That’s what Eliezer wants.
For all their talk of Bayesianism, nobody is going to check your bio and say, “Hmm, he’s a professor of mathematics with 20 publications in artificial intellgence; maybe I should take his opinion as seriously as that of the high-school dropout who has no experience building AI systems.”
Actually, that was the first thing I did, not sure about other people. What I saw was:
Teaches at what appears to be a small private liberal arts college, not a major school.
Out of 20 or so publications listed on http://www.richardloosemore.com/papers, a bunch are unrelated to AI, others are posters and interviews, or even “unpublished”, which are all low-confidence media.
Several contributions are entries in conference proceedings (are they peer-reviewed? I don’t know) .
A number are listed as “to appear”, and so impossible to evaluate.
A few are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.
One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.
I could not find any external references to RL’s work except through links to Ben Goertzel (IEET was one exception).
As a result, I was unable to independently evaluate RL’s expertise level, but clearly he is not at the top of the AI field, unlike say, Ben Goertzel. Given his poorly written posts and childish behavior here, indicative of an over-inflated ego, I have decided that whatever he writes can be safely ignored. I did not think of him as a crackpot, more like a noise maker.
Admittedly, I am not sold on Eliezer’s ideas, either, since many other AI experts are skeptical of them, and that’s the only thing I can go by, not being an expert in the field myself. But at least Eliezer has done several impossible things in the last decade or so, which commands a lot of respect, while Richard appears to be drifting along.
As a result, I was unable to independently evaluate RL’s expertise level, but clearly he is not at the top of the AI field, unlike say, Ben Goertzel.
At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel’s status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.
At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel’s status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.
Is there some assumption here that association with Ben Goertzel should be considered evidence in favour of an individual’s credibility on AI? That seems backwards.
Goertzel is also known for approving of people who are uncontroversially cranks. See here. It’s also known, via his cooperation with MIRI, that a collaboration with him in no way implies his endorsement of another’s viewpoints.
Could you point the interested reader to your critique of his work?
Comments can likely be found on this site from years ago. I don’t recall anything particularly in depth or memorable. It’s probably better to just look at things that Ben Goertzel says and making one’s own judgement. The thinking he expresses is not of the kind that impresses me but other’s mileage may vary.
I don’t begrudge anyone their right to their beauty contests but I do observe that whatever it is that is measured by identifying the degree of affiliation with Ben Goertzel is something wildly out of sync with the kind of thing I would consider evidence of credibility.
If only so I can cite them to Eliezer-is-a-crank people.
I advise against doing that. It is unlikely to change anyone’s mind.
By impossible feats I mean that a regular person would not be able to reproduce them, except by chance, like winning a lottery, starting Google, founding a successful religion or becoming a President.
He started as a high-school dropout without any formal education and look what he achieved so far, professionally and personally. Look at the organizations he founded and inspired. Look at the high-status experts in various fields (business, comp sci, programming, philosophy, math and physics) who take him seriously (some even give him loads of money). Heck, how many people manage to have multiple simultaneous long-term partners who are all highly intelligent and apparently get along well?
Basically this. As Eliezer himself points out, humans aren’t terribly rational on average and our judgements of each others’ rationality isn’t great either. Large amounts of support implies charisma, not intelligence.
TDT is closer to what I’m looking for, though it’s a … tad long.
I advise against doing that. It is unlikely to change anyone’s mind.
Point, but there’s also the middle ground “I’m not sure if he’s a crank or not, but I’m busy so I won’t look unless there’s some evidence he’s not.”
The big two I’ve come up with is a) he actually changes his mind about important things (though I need to find an actual post I can cite—didn’t he reopen the question of the possibility of a hard takeoff, or something?) and b) TDT.
Sure, but that’s hard to prove: given “Eliezer is a crank,” the probability of “Eliezer is lying about his AI-box prowess” is much higher than “Eliezer actually pulled that off.”
The latest success by a non-Eliezer person helps, but I’d still like something I can literally cite.
I don’t see why anyone would think that. Plenty of people in the anti-vaccination crowd managed to convince parents to mortally endanger their children.
Yes, but that’s really not that hard. For starters, you can do a better job of picking your targets.
The AI-box experiment often is run with intelligent, rational people with money on the line and an obvious right answer; it’s a whole lot more impossible than picking the right uneducated family to sell your snake oil to.
Ohh, come on. Cyclical reasoning here. You think Yudkowsky is not a crank, so you think the folks that play that silly game with him are intelligent and rational (by the way a plenty of people who get duped by anti-vaxxers are of above average IQ), and so you get more evidence that Yudkowsky is not a crank. Cyclical reasoning doesn’t persuade anyone who isn’t already a believer.
You need non-cyclical reasoning. Which would generally be something where you aren’t the one having to explain people that the achievement in question is profound.
You need non-cyclical reasoning. Which would generally be something where you aren’t the one having to explain people that the achievement in question is profound.
This bit confuses me.
That aside:
You think Yudkowsky is not a crank, so you think the folks that play that silly game with him are intelligent and rational
Non sequitur. From the posts they make, everyone on this site seems to me to be sufficiently intelligent as to make “selling snake oil” impossible, in a cut-and-dry case like the AI box. Yudowsky’s own credibility doesn’t enter into it.
From the posts they make, everyone on this site seems to me to be sufficiently intelligent as to make “selling snake oil” impossible, in a cut-and-dry case like the AI box.
So what do you think even happened, anyway, if you think the obvious explanation is impossible?
Originally, you were hypothesising that the problem with persuading the others would be the possibility that Yudkowsky lied about AI box powers. I pointed out the possibility that this experiment is far less profound than you think it is. (Albeit frankly I do not know why you think it is so profound).
Ah, sorry. This brand of impossible.
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place. I don’t think IQ is so strongly protective against deception, for example, and I do not think that you can assess something based on how the postings look to you with sufficient reliability as to overcome Gaussian priors very far from the mean.
edit: example. I would deem it quite unlikely that Yudkowsky could, for example, score highly on a programming contest with competent participants or in any other conventional, validated, reliable metric of technical expertise and ability, under good contest rules (i.e. excluding the possibility of externals assistance). So if he did something like that, I’d be quite surprised, and lower the confidence in what ever models deemed that impossible; good old Bayes. I’m far more confident in the validity of those conventional metrics (and in lack of alternate modes of passing, such as persuasion) than in my assessment so my assessment would change the most. Meanwhile, when it’s some unconventional game, well, even if I thought that this game is difficult, I’d be much less confident in the reasoning “it looks hard so it must be hard” than the low prior of exceptional performance is low.
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place. I don’t think IQ is so strongly protective against deception, for example, and I do not think that you can assess something based on how the postings look to you with sufficient reliability as to overcome Gaussian priors very far from the mean.
Further, in this case the whole purpose of the experiment was to demonstrate that an AI could “take over a gatekeeper’s mind through a text channel” (something previously deemed “impossible”). As far as that goes it was, in my view, successful.
It’s clearly possible for some values of “gatekeeper”, since some people fall for 419 scams. The test is a bit meaningless without information about the gatekeepers
Originally, you were hypothesising that the problem with persuading the others would be the possibility that Yudkowsky lied about AI box powers. I pointed out the possibility that this experiment is far less profound than you think it is. (Albeit frankly I do not know why you think it is so profound).
Still have no idea what you’re talking about. What I originally said was: “the people who talk to Yudkowsky are intelligent” does not follow from “Yudkowsky is not a crank”; I independently judge those people to be intelligent.
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place.
“Impossible,” here, is used in the sense that “I have no idea where to start thinking about where to start thinking about how to do this.” It is clearly not actually impossible because it’s been done, twice.
I thought your “impossible” at least implied “improbable” under some sort of model.
edit: and as of having no idea, you just need to know the shared religious-ish context. Which these folks generally keep hidden from a causal observer.
Impossible is being used as a statement of difficulty. Someone who has “done the impossible” has obviously not actually done something impossible, merely done something that I have no idea where to start trying.
Seeing that “it is possible to do” doesn’t seem like it would have much effect on my assessment of how difficult it is, after the first. It certainly doesn’t have match effect on “It is very-very-difficult-impossible for linkhyrule5 to do such a thing.”
and as of having no idea, you just need to know the shared religious-ish context. Which these folks generally keep hidden from a causal observer.
What?
First, I’m pretty sure you mean “casual.” Second, I’m hardly a casual observer, though I haven’t read everything either. Third, most religions don’t let their leading figures (or much of anyone, really) change their minds on important things...
Some folks on this site have accidentally bought unintentional snake oil in The Big Hoo Hah That Shall not Be Mentioned. Only an intelligent person could have bought that particular puppy,
My point is, there is a certain level of general competence after which I would expect convincing someone with an OOC motive to let an IC AI out to be “impossible,” as defined below.
Results. Undervaccinated children tended to be black, to have a younger mother who was not married and did not have a college degree, to live in a household near the poverty level, and to live in a central city. Unvaccinated children tended to be white, to have a mother who was married and had a college degree, to live in a household with an annual income exceeding $75 000, and to have parents who expressed concerns regarding the safety of vaccines and indicated that medical doctors have little influence over vaccination decisions for their children.
And in any case the point is that any correlation between IQ and not being prone to getting duped like this is not perfect enough to deem anything particularly unlikely.
Hmm. Yeah, that’s hardly conclusive, but I think I was actually failing to update there. Now that you mention it, I seem to recall that both conspiracy theorists and cult victims skew toward higher IQ. I was clearly quite overconfident there.
And in any case the point is that any correlation between IQ and not being prone to getting duped like this is not perfect enough to deem anything particularly unlikely.
Wasn’t the point that
intelligent, rational people with money on the line and an obvious right answer
wasn’t enough, actually? That seems like a much stronger claim than “it’s really hard to fool high-IQ people”.
I imagine that says more about the demographics of the general New Age belief cluster than it does about any special IQ-based appeal of vaccination skepticism.
There probably are some scams or virulent memes that prey on insecurities strongly correlated with high IQ, though. I can’t think of anything specific offhand, but the fringes of geek culture are probably one of the better places to start looking.
Well, the way I see it, outside of very high IQ in combination with education that is multiple topics of biochemistry, effects of intelligence are small and are easily dwarfed by things like those demographical correlations.
There probably are some scams or virulent memes that prey on insecurities specific to high-IQ people, though. I can’t think of anything specific offhand
Free energy scams. Hydrinos, cold fusion, magnetic generators, perpetual motion, you name it. edit: or in the medicine, counter intuitive stuff like sitting in an old uranium mine inhaling radon, then having so much radon progeny plate-out it sets nuclear material smuggling alarms off. Naturalistic fallacy stuff in general.
That is more persuasive to high IQ people, but, I think, only insofar as intelligence allows one to gain better rationality skills. And if we’re including that, there are plenty of other, facetious examples that come into play.
Also: ha ha. How hilarious. I would love to see why you class cryonics as a scam, but sadly I’m fairly certain it would be one of the standard mistakes.
I was in a rush last night, shminux, so I didn’t have time for a couple of other quick clarifications:
First, you say “One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.”
Well, H+ magazine is one of the foremost online magazines (perhaps THE foremost online magazine) of the transhumanist community.
And, you mention Springer. You did not notice that one of my papers was in the recently published Springer book “Singularity Hypotheses”.
Second, you say “A few [of my papers] are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.”
Actually they were about dysgraphia, not dyslexia … but more importantly, those papers were about computational models of language processing. In particular they were very, VERY simple versions of the computational model of human language that is one of my special areas of expertise. And since that model is primarily about learning mechanisms (the language domain is only a testbed for a research programme whose main focus is learning), those papers you saw were actually indicative that back in the early 1990s I was already working on the construction of the core aspects of an AI system.
So, saying “dyslexia” gives a very misleading impression of what that was all about. :-)
You are quite selective in your catalog of my achievements....
One item was a chapter in a book entitled “Theoretical Foundations of Artificial General Intelligence”. Sure, it was about the consciousness question, but still.
You make a casual disparaging remark about the college where I currently work … but forget to mention that I graduated from an institution that is ranked in the top 3 or 4 in the world (University College London).
You neglect to mention that I have academic qualifications in multiple fields—both physics and artificial intelligence/cognitive psychology. I now teach in both of those fields.
And in addition to all of the above, you did not notice that I am (in addition to my teaching duties) an AI developer who works on his projects WITHOUT intending to publish that work all the time! My AI work is largely proprietary. What you see from the outside are the occasional spinoffs and side projects that get turned into published writings. Not to be too coy, but isn’t that something you would expect from someone who is actually walking the walk....? :-)
There are a number of comments from other people below about Ben Goertzel, some of them a little strange. I wrote a paper a couple of years ago that Ben suggested we get together to and publish… that is now a chapter in the book “Singularity Hypotheses”.
So clearly Ben Goertzel (who has a large, well-funded AGI lab) is not of the opinion that I am a crank. Could I get one point for that?
Phil Goetz, who is an experienced veteran of the AGI field, has on this thread made a comment to the effect that he thinks that Ben Goertzel, himself, and myself are the three people Eliezer should be seriously listening to (since the three of us are among the few people who have been working on this problem for many years, and who have active AGI projects). So perhaps that is two points? Maybe?
And, just out of curiosity, I would invite you to check in with the guy who invented AIXI—Marcus Hutter. He and I met and had a very long discussion at the 2009 AGI conference. Marcus and I disagree substantially about the theoretical foundations of AI, but in spite of that disagreement I would urge you to ask him if he considers me to be down at the crank level. I might be wrong, but I do not think he would be willing to give me a bad reference. Let me know how that goes, yes?
You also finished off with what I can only describe as one of the most bizarre comparisons I have ever seen. :-) You say “Eliezer has done several impossible things in the last decade or so”. Hmmmm....! :-) And yet … “Richard appears to be drifting along” Well, okay, if you say so …. :-)
I have no horse in this race, and I am not an ardent EY supporter, or even count myself as a “rationalist”. In the area where I consider myself reasonably well trained, physics, he and I clashed a number of times on this forum. However, I am not an expert in the AI field, so I can only go by the outward signs of expertise. Ben Goertzel has them, Marcus Hutter has them, Eliezer has them. Richard Loosemore—not so much. For all I know, you might be the genius who invents the AGI and sets it loose someday, but it’s not obvious by looking online. And your histrionic comments and oversized ego make it appear rather unlikely.
I didn’t quit with Rob, btw. Ihave had a fairly productive—albeit exhausting—discussion with Rob over on his blog. I consider it to be productive because I have managed to narrow in on what he thinks is the central issue. And I think I have now (today’s comment, which is probably the last of the discussion) managed to nail down my own argument in a way that withstands all the attacks against it.
You are right that I have some serious debating weaknesses. I write too dense, and I assume that people have my width and breadth of experience, which is unfair (I got lucky in my career choices).
Oh, and don’t get me wrong: Eliezer never made me angry in this little episode. I laughed myself silly. Yeah, I protested. But I was wiping back tears of laughter while I did. “Known Permanent Idiot” is just a wondeful turn of phrase. Thanks, Eliezer!
Anyway, I went and read the the majority of that discussion (well, the parts between Richard and Rob). Here’s my summary:
Richard:
I think that what is happening in this discussion [...] is a misunderstanding. [...]
[Rob responds]
Richard:
You completely miss the point that I was trying to make. [...]
[Rob responds]
Richard:
You are talking around the issue I raised. [...] There is a gigantic elephant in the middle of this room, but your back is turned to it. [...]
[Rob responds]
Richard:
[...] But each time I explain my real complaint, you ignore it and respond as if I did not say anything about that issue. Can you address my particular complaint, and not that other distraction?
[Rob responds]
Richard:
[...] So far, nobody (neither Rob nor anyone else at LW or elsewhere) will actually answer that question. [...]
[Rob responds]
Richard:
Once again, I am staggered and astonished by the resilience with which you avoid talking about the core issue, and instead return to the red herring that I keep trying to steer you away from. [...]
Rob:
Alright. You say I’ve been dancing around your “core” point. I think I’ve addressed your concerns quite directly, [...] To prevent yet another suggestion that I haven’t addressed the “core”, I’ll respond to everything you wrote above. [...]
Richard:
Rob, it happened again. [...]
I snipped a lot of things there. I found lots of other points I wanted to emphasize, and plenty of things I wanted to argue against. But those aren’t the point.
Richard, this next part is directed at you.
You know what I didn’t find?
I didn’t find any posts where you made a particular effort to address the core of Rob’s argument. It was always about your argument. Rob was always the one missing the point.
Sure, it took Rob long enough to focus on finding the core of your position, but he got there eventually. And what happened next? You declared that he was still missing the point, posted a condensed version of the same argument, and posted here that your position “withstands all the attacks against it.”
You didn’t even wait for him to respond. You certainly didn’t quote him and respond to the things he said. You gave no obvious indication that you were taking his arguments seriously.
As far as I’m concerned, this is a cardinal sin.
I think I am explaining the point with such long explanations that I am causing you to miss the point.
How about this alternate hypothesis? Your explanations are fine.
Rob understands what you’re saying.
He just doesn’t agree.
Perhaps you need to take a break from repeating yourself and make sure you understand Rob’s argument.
(P.S. Eliezer’s ad hominem is still wrong. You may be making a mistake, but I’m confident you can fix it, the tone of this post notwithstanding.)
This entire debate is supposed to about my argument, as presented in the original article I published on the IEET.org website (“The Fallacy of Dumb Superintelligence”).
But in that case, what should I do when Rob insists on talking about something that I did not say in that article?
My strategy was to explain his mistake, but not engage in a debate about his red herring. Sensible people of all stripes would consider that a mature response.
But over and over again Rob avoided the actual argument and insisted on talking about his red herring.
And then FINALLY I realized that I could write down my original claim in such a way that it is IMPOSSIBLE for Rob to misinterpret it.
(That was easy, in retrospect: all I had to do was remove the language that he was using as the jumping-off point for his red herring).
That final, succinct statement of my argument is sitting there at the end of his blog ….. so far ignored by you, and by him. Perhaps he will be able to respond, I don’t know, but you say you have read it, so you have had a chance to actually understand why it is that he has been talking about something of no relevance to my original argument.
But you, in your wisdom, chose to (a) completely ignore that statement of my argument, and (b) give me a patronizing rebuke for not being able to understand Rob’s red herring argument.
Further to RobbBB’s highly useful reply (you should read it), I would like to point out that the phenomenon of optimization processes having unexpected or unusual behaviour when placed in new situations or given new capabilities is quite general. It applies to pretty much any strong optimization process working toward any goal.
For example, you program an AI/robot with no resources to fetch a banana as quickly as possible in some sandbox environment, where the banana is guarded by a few people. The AI will describe its strategy as talking to the guards and convincing them to give it the banana. Or perhaps buying the banana off them if you give it some money. But there are many ways to change the situation:
Make the AI much smarter—it will realise that by taking a certain sequence of steps it can get from here to the banana without ever moving within someone’s field of view. No need to pay for it. (1.5—Make the AI even smarter, and it will realise that by painting a Basilisk on the robot’s front it can disable all the guards effortlessly and take a straight line to the banana.)
Give the AI a teleportation device—it teleports straight to the banana.
Give the AI a plasma cannon—it shoots all the guards dead and runs up and takes the banana.
The point is… it’s not at all weird for AI behaviour to be “inconsistent”. It isn’t a sign of anything being broken, in fact the goal is being achieved. The AI is just able to think of more effective ways to do it then you are. That is, after all, the point of superintelligence. And an AI that does this is not broken or stupid, and is certainly capable of being dangerous.
By the way, you can try to do something like this:
[ And by the way: one important feature that is OBVIOUSLY going to be in the goalX code is this: that the outcome of any actions that the goalX code prescribes, should always be checked to see if they are as consistent as possible with the verbal description of the class of results X, and if any inconsistency occurs the goalX code should be deemed defective, and be shut down for adjustment.]
But, to start with I have no idea how you would program this or what it means formally, but even if you could, it takes human judgement to identify “inconsistencies” that would matter to humans. Without embedding human values in there you’ll have the AI shut down every time it tries to do anything new, or use a stronger criterion of “inconsistency” and miss a few cases where the AI does something you actually don’t want.
Or, you know, the AI will deduce that the full “verbal description of the class of results X” (which is an infinite list) is of course defined by its goal (ie. the goalX code) and therefore reason that nothing the goalX code can do will be inconsistent with it.
I didn’t mean to ignore your argument; I just didn’t get around to it. As I said, there were a lot of things I wanted to respond to. (In fact, this post was going to be longer, but I decided to focus on your primary argument.)
Your story:
This hypothetical AI will say “I have a goal, and my goal is to get a certain class of results, X, in the real world.” [...] And we say “Hey, no problem: looks like your goal code is totally consistent with that verbal description of the desired class of results.” Everything is swell up to this point.
My version:
The AI is lying. Or possibly it isn’t very smart yet, so it’s bad at describing its goal. Or it’s oversimplifying, because the programmers told it to, because otherwise the goal description would take days. And the goal code itself is too complicated for the programmers to fully understand. In any case, everything is not swell.
Your story:
Then one day the AI says “Okay now, today my goalX code says I should do this…” and it describes an action that is VIOLENTLY inconsistent with the previously described class of results, X. This action violates every one of the features of the class that were previously given.
My version:
The AI’s goal was never really X. It was actually Z. The AI’s actions perfectly coincide with Z.
In the rest of the scenario you described, I agree that the AI’s behavior is pretty incoherent, if its goal is X. But if it’s really aiming for Z, then its behavior is perfectly, terrifyingly coherent.
And your “obvious” fail-safe isn’t going to help. The AI is smarter than us. If it wants Z, and a fail-safe prevents it from getting Z, it will find a way around that fail-safe.
I know, your premise is that X really is the AI’s true goal. But that’s my sticking point.
Making it actually have the goal X, before it starts self-modifying, is far from easy. You can’t just skip over that step and assume it as your premise.
What you say makes sense …. except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means …. the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the “fail-safe” is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where ”… it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe.” In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
You and I are both bound by the terms of a scenario that someone else has set here.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.
Phil, Unfortunately you are commenting without (seemingly) checking the original article of mine that RobbBB is discussing here. So, you say “On the other hand, Richard, I think, wants to simply tell the AI, in English, “Make me happy.” ”. In fact, I am not at all saying that. :-)
My article was discussing someone else’s claims about AI, and dissecting their claims. So I was not making any assertions of my own about the motivation system.
Aside: You will also note that I was having a productive conversation with RobbBB about his piece, when Yudkowsky decided to intervene with some gratuitous personal slander directed at me (see above). That discussion is now at an end.
I’m afraid reading all that and giving a full response to either you or RobbBB isn’t possible in the time I have available this weekend.
I agree that Eliezer is acting like a spoiled child, but calling people on their irrational interpersonal behavior within less wrong doesn’t work. Calling them on mistakes they make about mathematics is fine, but calling them on how they treat others on less wrong will attract more reflexive down-votes from people who think you’re contaminating their forum with emotion, than upvotes from people who care.
Eliezer may be acting rationally. His ultimate purpose in building this site is to build support for his AI project. The only people on LessWrong, AFAIK, with decades of experience building AI systems, mapping beliefs and goals into formal statements, and then turning them on and seeing what happens, are you, me, and Ben Goertzel. Ben doesn’t care enough about Eliezer’s thoughts in particular to engage with them deeply; he wants to talk about generic futurist predictions such as near-term and far-term timelines. These discussions don’t deal in the complex, linguistic, representational, even philosophical problems at the core of Eliezer’s plan (though Ben is capable of dealing with them, they just don’t come up in discussions of AI fooms etc.), so even when he disagrees with Eliezer, Eliezer can quickly grasp his point. He is not a threat or a puzzle.
Whereas your comments are… very long, hard to follow, and often full of colorful or emotional statements that people here take as evidence of irrationality. You’re expecting people to work harder at understanding them than they’re going to. If you haven’t noticed, reputation counts for nothing here. For all their talk of Bayesianism, nobody is going to check your bio and say, “Hmm, he’s a professor of mathematics with 20 publications in artificial intellgence; maybe I should take his opinion as seriously as that of the high-school dropout who has no experience building AI systems.” And Eliezer has carefully indoctrinated himself against considering any such evidence.
So if you consider that the people most likely to find the flaws in Eliezer’s more-specific FAI & CEV plans are you and me, and that Eliezer has been public about calling both of us irrational people not worth talking with, this is consistent either with the hypothesis that his purpose is to discredit people who pose threats to his program, or with the hypothesis that his ego is too large to respond with anything other than dismissal to critiques that he can’t understand immediately or that trigger his “crackpot” patter-matcher, but not with the hypothesis that arguing with him will change his mind.
(I find the continual readiness of people to assume that Eliezer always speaks the truth odd, when he’s gone more out of his way than anyone I know, in both his blog posts and his fanfiction, to show that honest argumentation is not generally a winning strategy. He used to append a signature to his email along those lines, something about warning people not to assume that the obvious interpretation of what he said was the truth.)
RobbBB seems diplomatic, and I don’t think you should quit talking with him because Eliezer made you angry. That’s what Eliezer wants.
Actually, that was the first thing I did, not sure about other people. What I saw was:
Teaches at what appears to be a small private liberal arts college, not a major school.
Out of 20 or so publications listed on http://www.richardloosemore.com/papers, a bunch are unrelated to AI, others are posters and interviews, or even “unpublished”, which are all low-confidence media.
Several contributions are entries in conference proceedings (are they peer-reviewed? I don’t know) .
A number are listed as “to appear”, and so impossible to evaluate.
A few are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.
One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.
I could not find any external references to RL’s work except through links to Ben Goertzel (IEET was one exception).
As a result, I was unable to independently evaluate RL’s expertise level, but clearly he is not at the top of the AI field, unlike say, Ben Goertzel. Given his poorly written posts and childish behavior here, indicative of an over-inflated ego, I have decided that whatever he writes can be safely ignored. I did not think of him as a crackpot, more like a noise maker.
Admittedly, I am not sold on Eliezer’s ideas, either, since many other AI experts are skeptical of them, and that’s the only thing I can go by, not being an expert in the field myself. But at least Eliezer has done several impossible things in the last decade or so, which commands a lot of respect, while Richard appears to be drifting along.
At least a few of the RL authored papers are WITH Ben Goertzel, so some of Goertzel’s status should rub-off, as I would trust Goertzel to effectively evaluate collaborators.
Is there some assumption here that association with Ben Goertzel should be considered evidence in favour of an individual’s credibility on AI? That seems backwards.
Well, it does show that Goertzel respects his opinions at least enough to be willing to author a paper with him.
Goertzel appears to be a respected figuer in the field. Could you point the interested reader to your critique of his work?
Goertzel is also known for approving of people who are uncontroversially cranks. See here. It’s also known, via his cooperation with MIRI, that a collaboration with him in no way implies his endorsement of another’s viewpoints.
Comments can likely be found on this site from years ago. I don’t recall anything particularly in depth or memorable. It’s probably better to just look at things that Ben Goertzel says and making one’s own judgement. The thinking he expresses is not of the kind that impresses me but other’s mileage may vary.
I don’t begrudge anyone their right to their beauty contests but I do observe that whatever it is that is measured by identifying the degree of affiliation with Ben Goertzel is something wildly out of sync with the kind of thing I would consider evidence of credibility.
In CS, conference papers are generally higher status & quality than journal articles.
Name three? If only so I can cite them to Eliezer-is-a-crank people.
I advise against doing that. It is unlikely to change anyone’s mind.
By impossible feats I mean that a regular person would not be able to reproduce them, except by chance, like winning a lottery, starting Google, founding a successful religion or becoming a President.
He started as a high-school dropout without any formal education and look what he achieved so far, professionally and personally. Look at the organizations he founded and inspired. Look at the high-status experts in various fields (business, comp sci, programming, philosophy, math and physics) who take him seriously (some even give him loads of money). Heck, how many people manage to have multiple simultaneous long-term partners who are all highly intelligent and apparently get along well?
He’s achieved about what Ayn Rand achieved, and almost everyone thinks she wasa crank.
Basically this. As Eliezer himself points out, humans aren’t terribly rational on average and our judgements of each others’ rationality isn’t great either. Large amounts of support implies charisma, not intelligence.
TDT is closer to what I’m looking for, though it’s a … tad long.
Point, but there’s also the middle ground “I’m not sure if he’s a crank or not, but I’m busy so I won’t look unless there’s some evidence he’s not.”
The big two I’ve come up with is a) he actually changes his mind about important things (though I need to find an actual post I can cite—didn’t he reopen the question of the possibility of a hard takeoff, or something?) and b) TDT.
Won some AI box experiments as the AI.
Sure, but that’s hard to prove: given “Eliezer is a crank,” the probability of “Eliezer is lying about his AI-box prowess” is much higher than “Eliezer actually pulled that off.”
The latest success by a non-Eliezer person helps, but I’d still like something I can literally cite.
I don’t see why anyone would think that. Plenty of people in the anti-vaccination crowd managed to convince parents to mortally endanger their children.
Yes, but that’s really not that hard. For starters, you can do a better job of picking your targets.
The AI-box experiment often is run with intelligent, rational people with money on the line and an obvious right answer; it’s a whole lot more impossible than picking the right uneducated family to sell your snake oil to.
Ohh, come on. Cyclical reasoning here. You think Yudkowsky is not a crank, so you think the folks that play that silly game with him are intelligent and rational (by the way a plenty of people who get duped by anti-vaxxers are of above average IQ), and so you get more evidence that Yudkowsky is not a crank. Cyclical reasoning doesn’t persuade anyone who isn’t already a believer.
You need non-cyclical reasoning. Which would generally be something where you aren’t the one having to explain people that the achievement in question is profound.
You probably mean “circular”.
This bit confuses me.
That aside:
Non sequitur. From the posts they make, everyone on this site seems to me to be sufficiently intelligent as to make “selling snake oil” impossible, in a cut-and-dry case like the AI box. Yudowsky’s own credibility doesn’t enter into it.
I thought you wanted to persuade others.
So what do you think even happened, anyway, if you think the obvious explanation is impossible?
Yes, but I don’t see why this is relevant
Ah, sorry. This brand of impossible.
Originally, you were hypothesising that the problem with persuading the others would be the possibility that Yudkowsky lied about AI box powers. I pointed out the possibility that this experiment is far less profound than you think it is. (Albeit frankly I do not know why you think it is so profound).
What ever is the brand, any “impossibilities” that happen should lower your confidence in the reasoning that deemed them “impossibilities” in the first place. I don’t think IQ is so strongly protective against deception, for example, and I do not think that you can assess something based on how the postings look to you with sufficient reliability as to overcome Gaussian priors very far from the mean.
edit: example. I would deem it quite unlikely that Yudkowsky could, for example, score highly on a programming contest with competent participants or in any other conventional, validated, reliable metric of technical expertise and ability, under good contest rules (i.e. excluding the possibility of externals assistance). So if he did something like that, I’d be quite surprised, and lower the confidence in what ever models deemed that impossible; good old Bayes. I’m far more confident in the validity of those conventional metrics (and in lack of alternate modes of passing, such as persuasion) than in my assessment so my assessment would change the most. Meanwhile, when it’s some unconventional game, well, even if I thought that this game is difficult, I’d be much less confident in the reasoning “it looks hard so it must be hard” than the low prior of exceptional performance is low.
Further, in this case the whole purpose of the experiment was to demonstrate that an AI could “take over a gatekeeper’s mind through a text channel” (something previously deemed “impossible”). As far as that goes it was, in my view, successful.
It’s clearly possible for some values of “gatekeeper”, since some people fall for 419 scams. The test is a bit meaningless without information about the gatekeepers
Still have no idea what you’re talking about. What I originally said was: “the people who talk to Yudkowsky are intelligent” does not follow from “Yudkowsky is not a crank”; I independently judge those people to be intelligent.
“Impossible,” here, is used in the sense that “I have no idea where to start thinking about where to start thinking about how to do this.” It is clearly not actually impossible because it’s been done, twice.
And point about the contest.
I thought your “impossible” at least implied “improbable” under some sort of model.
edit: and as of having no idea, you just need to know the shared religious-ish context. Which these folks generally keep hidden from a causal observer.
Impossible is being used as a statement of difficulty. Someone who has “done the impossible” has obviously not actually done something impossible, merely done something that I have no idea where to start trying.
Seeing that “it is possible to do” doesn’t seem like it would have much effect on my assessment of how difficult it is, after the first. It certainly doesn’t have match effect on “It is very-very-difficult-impossible for linkhyrule5 to do such a thing.”
What?
First, I’m pretty sure you mean “casual.” Second, I’m hardly a casual observer, though I haven’t read everything either. Third, most religions don’t let their leading figures (or much of anyone, really) change their minds on important things...
Some folks on this site have accidentally bought unintentional snake oil in The Big Hoo Hah That Shall not Be Mentioned. Only an intelligent person could have bought that particular puppy,
Granted. And it may be that additional knowledge/intelligence makes yourself more vulnerable a Gatekeeper.
Trying to think this out in terms of levels of smartness alone is very unlikely to be helpful.
Well yes. It is a factor, no more no less.
My point is, there is a certain level of general competence after which I would expect convincing someone with an OOC motive to let an IC AI out to be “impossible,” as defined below.
But less than half of them, I’ll wager. This is clearly an abuse of averages.
I wouldn’t wager too much money on that one. http://pediatrics.aappublications.org/content/114/1/187.abstract .
And in any case the point is that any correlation between IQ and not being prone to getting duped like this is not perfect enough to deem anything particularly unlikely.
Hmm. Yeah, that’s hardly conclusive, but I think I was actually failing to update there. Now that you mention it, I seem to recall that both conspiracy theorists and cult victims skew toward higher IQ. I was clearly quite overconfident there.
Wasn’t the point that
wasn’t enough, actually? That seems like a much stronger claim than “it’s really hard to fool high-IQ people”.
I imagine that says more about the demographics of the general New Age belief cluster than it does about any special IQ-based appeal of vaccination skepticism.
There probably are some scams or virulent memes that prey on insecurities strongly correlated with high IQ, though. I can’t think of anything specific offhand, but the fringes of geek culture are probably one of the better places to start looking.
Well, the way I see it, outside of very high IQ in combination with education that is multiple topics of biochemistry, effects of intelligence are small and are easily dwarfed by things like those demographical correlations.
Free energy scams. Hydrinos, cold fusion, magnetic generators, perpetual motion, you name it. edit: or in the medicine, counter intuitive stuff like sitting in an old uranium mine inhaling radon, then having so much radon progeny plate-out it sets nuclear material smuggling alarms off. Naturalistic fallacy stuff in general.
Cryonics. ducks and runs
Edit: It was a joke. Sorryyyyyy
That is more persuasive to high IQ people, but, I think, only insofar as intelligence allows one to gain better rationality skills. And if we’re including that, there are plenty of other, facetious examples that come into play.
Also: ha ha. How hilarious. I would love to see why you class cryonics as a scam, but sadly I’m fairly certain it would be one of the standard mistakes.
Also, maybe its a matter of semantics, but winning a game that you created isn’t really ‘doing the impossible’ in the sense I took the phrasing.
Winning a game you created… that sounds as impossible to win as that?
I was in a rush last night, shminux, so I didn’t have time for a couple of other quick clarifications:
First, you say “One relevant paper was in H+ magazine, a place I have never heard of before and apparently not a part of any well-known scientific publishing outlet, like Springer.”
Well, H+ magazine is one of the foremost online magazines (perhaps THE foremost online magazine) of the transhumanist community.
And, you mention Springer. You did not notice that one of my papers was in the recently published Springer book “Singularity Hypotheses”.
Second, you say “A few [of my papers] are apparently about dyslexia, which is an interesting topic, but not obviously related to AI.”
Actually they were about dysgraphia, not dyslexia … but more importantly, those papers were about computational models of language processing. In particular they were very, VERY simple versions of the computational model of human language that is one of my special areas of expertise. And since that model is primarily about learning mechanisms (the language domain is only a testbed for a research programme whose main focus is learning), those papers you saw were actually indicative that back in the early 1990s I was already working on the construction of the core aspects of an AI system.
So, saying “dyslexia” gives a very misleading impression of what that was all about. :-)
That is a very interesting assessment, shminux.
Would you be up for some feedback?
You are quite selective in your catalog of my achievements....
One item was a chapter in a book entitled “Theoretical Foundations of Artificial General Intelligence”. Sure, it was about the consciousness question, but still.
You make a casual disparaging remark about the college where I currently work … but forget to mention that I graduated from an institution that is ranked in the top 3 or 4 in the world (University College London).
You neglect to mention that I have academic qualifications in multiple fields—both physics and artificial intelligence/cognitive psychology. I now teach in both of those fields.
And in addition to all of the above, you did not notice that I am (in addition to my teaching duties) an AI developer who works on his projects WITHOUT intending to publish that work all the time! My AI work is largely proprietary. What you see from the outside are the occasional spinoffs and side projects that get turned into published writings. Not to be too coy, but isn’t that something you would expect from someone who is actually walking the walk....? :-)
There are a number of comments from other people below about Ben Goertzel, some of them a little strange. I wrote a paper a couple of years ago that Ben suggested we get together to and publish… that is now a chapter in the book “Singularity Hypotheses”.
So clearly Ben Goertzel (who has a large, well-funded AGI lab) is not of the opinion that I am a crank. Could I get one point for that?
Phil Goetz, who is an experienced veteran of the AGI field, has on this thread made a comment to the effect that he thinks that Ben Goertzel, himself, and myself are the three people Eliezer should be seriously listening to (since the three of us are among the few people who have been working on this problem for many years, and who have active AGI projects). So perhaps that is two points? Maybe?
And, just out of curiosity, I would invite you to check in with the guy who invented AIXI—Marcus Hutter. He and I met and had a very long discussion at the 2009 AGI conference. Marcus and I disagree substantially about the theoretical foundations of AI, but in spite of that disagreement I would urge you to ask him if he considers me to be down at the crank level. I might be wrong, but I do not think he would be willing to give me a bad reference. Let me know how that goes, yes?
You also finished off with what I can only describe as one of the most bizarre comparisons I have ever seen. :-) You say “Eliezer has done several impossible things in the last decade or so”. Hmmmm....! :-) And yet … “Richard appears to be drifting along” Well, okay, if you say so …. :-)
I have no horse in this race, and I am not an ardent EY supporter, or even count myself as a “rationalist”. In the area where I consider myself reasonably well trained, physics, he and I clashed a number of times on this forum. However, I am not an expert in the AI field, so I can only go by the outward signs of expertise. Ben Goertzel has them, Marcus Hutter has them, Eliezer has them. Richard Loosemore—not so much. For all I know, you might be the genius who invents the AGI and sets it loose someday, but it’s not obvious by looking online. And your histrionic comments and oversized ego make it appear rather unlikely.
I agree with pretty much all of the above.
I didn’t quit with Rob, btw. Ihave had a fairly productive—albeit exhausting—discussion with Rob over on his blog. I consider it to be productive because I have managed to narrow in on what he thinks is the central issue. And I think I have now (today’s comment, which is probably the last of the discussion) managed to nail down my own argument in a way that withstands all the attacks against it.
You are right that I have some serious debating weaknesses. I write too dense, and I assume that people have my width and breadth of experience, which is unfair (I got lucky in my career choices).
Oh, and don’t get me wrong: Eliezer never made me angry in this little episode. I laughed myself silly. Yeah, I protested. But I was wiping back tears of laughter while I did. “Known Permanent Idiot” is just a wondeful turn of phrase. Thanks, Eliezer!
Link to the nailed-down version of the argument?
Bottommost (September 9, 6:03 PM) comment here.
Oh, yeah, I found that myself eventually.
Anyway, I went and read the the majority of that discussion (well, the parts between Richard and Rob). Here’s my summary:
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
[Rob responds]
Richard:
Rob:
Richard:
I snipped a lot of things there. I found lots of other points I wanted to emphasize, and plenty of things I wanted to argue against. But those aren’t the point.
Richard, this next part is directed at you.
You know what I didn’t find?
I didn’t find any posts where you made a particular effort to address the core of Rob’s argument. It was always about your argument. Rob was always the one missing the point.
Sure, it took Rob long enough to focus on finding the core of your position, but he got there eventually. And what happened next? You declared that he was still missing the point, posted a condensed version of the same argument, and posted here that your position “withstands all the attacks against it.”
You didn’t even wait for him to respond. You certainly didn’t quote him and respond to the things he said. You gave no obvious indication that you were taking his arguments seriously.
As far as I’m concerned, this is a cardinal sin.
How about this alternate hypothesis? Your explanations are fine. Rob understands what you’re saying. He just doesn’t agree.
Perhaps you need to take a break from repeating yourself and make sure you understand Rob’s argument.
(P.S. Eliezer’s ad hominem is still wrong. You may be making a mistake, but I’m confident you can fix it, the tone of this post notwithstanding.)
This entire debate is supposed to about my argument, as presented in the original article I published on the IEET.org website (“The Fallacy of Dumb Superintelligence”).
But in that case, what should I do when Rob insists on talking about something that I did not say in that article?
My strategy was to explain his mistake, but not engage in a debate about his red herring. Sensible people of all stripes would consider that a mature response.
But over and over again Rob avoided the actual argument and insisted on talking about his red herring.
And then FINALLY I realized that I could write down my original claim in such a way that it is IMPOSSIBLE for Rob to misinterpret it.
(That was easy, in retrospect: all I had to do was remove the language that he was using as the jumping-off point for his red herring).
That final, succinct statement of my argument is sitting there at the end of his blog ….. so far ignored by you, and by him. Perhaps he will be able to respond, I don’t know, but you say you have read it, so you have had a chance to actually understand why it is that he has been talking about something of no relevance to my original argument.
But you, in your wisdom, chose to (a) completely ignore that statement of my argument, and (b) give me a patronizing rebuke for not being able to understand Rob’s red herring argument.
Further to RobbBB’s highly useful reply (you should read it), I would like to point out that the phenomenon of optimization processes having unexpected or unusual behaviour when placed in new situations or given new capabilities is quite general. It applies to pretty much any strong optimization process working toward any goal.
For example, you program an AI/robot with no resources to fetch a banana as quickly as possible in some sandbox environment, where the banana is guarded by a few people. The AI will describe its strategy as talking to the guards and convincing them to give it the banana. Or perhaps buying the banana off them if you give it some money. But there are many ways to change the situation:
Make the AI much smarter—it will realise that by taking a certain sequence of steps it can get from here to the banana without ever moving within someone’s field of view. No need to pay for it. (1.5—Make the AI even smarter, and it will realise that by painting a Basilisk on the robot’s front it can disable all the guards effortlessly and take a straight line to the banana.)
Give the AI a teleportation device—it teleports straight to the banana.
Give the AI a plasma cannon—it shoots all the guards dead and runs up and takes the banana.
The point is… it’s not at all weird for AI behaviour to be “inconsistent”. It isn’t a sign of anything being broken, in fact the goal is being achieved. The AI is just able to think of more effective ways to do it then you are. That is, after all, the point of superintelligence. And an AI that does this is not broken or stupid, and is certainly capable of being dangerous.
By the way, you can try to do something like this:
But, to start with I have no idea how you would program this or what it means formally, but even if you could, it takes human judgement to identify “inconsistencies” that would matter to humans. Without embedding human values in there you’ll have the AI shut down every time it tries to do anything new, or use a stronger criterion of “inconsistency” and miss a few cases where the AI does something you actually don’t want.
Or, you know, the AI will deduce that the full “verbal description of the class of results X” (which is an infinite list) is of course defined by its goal (ie. the goalX code) and therefore reason that nothing the goalX code can do will be inconsistent with it.
I didn’t mean to ignore your argument; I just didn’t get around to it. As I said, there were a lot of things I wanted to respond to. (In fact, this post was going to be longer, but I decided to focus on your primary argument.)
Your story:
My version:
Your story:
My version:
In the rest of the scenario you described, I agree that the AI’s behavior is pretty incoherent, if its goal is X. But if it’s really aiming for Z, then its behavior is perfectly, terrifyingly coherent.
And your “obvious” fail-safe isn’t going to help. The AI is smarter than us. If it wants Z, and a fail-safe prevents it from getting Z, it will find a way around that fail-safe.
I know, your premise is that X really is the AI’s true goal. But that’s my sticking point.
Making it actually have the goal X, before it starts self-modifying, is far from easy. You can’t just skip over that step and assume it as your premise.
What you say makes sense …. except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means …. the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the “fail-safe” is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where ”… it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe.” In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
Hope that clarifies rather than obscures.
Ok, if you want to pass the buck, I won’t stop you. But this other person’s scenario still has a faulty premise. I’ll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it’s not very useful to discuss an AI with a “sincere” goal of X, because the difficulty comes from giving the AI that goal in the first place.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the “story for another day.” Specifically, a day when we’ve solved the “sincere goal” issue.