How many people here agree with Holden? [Actually, who agrees with Holden?]
I was wondering—what fraction of people here agree with Holden’s advice regarding donations, and his arguments? What fraction assumes there is a good chance he is essentially correct? What fraction finds it necessary to determine whenever Holden is essentially correct in his assessment, before working on counter argumentation, acknowledging that such investigation should be able to result in dissolution or suspension of SI?
It would seem to me, from the response, that the chosen course of action is to try to improve the presentation of the argument, rather than to try to verify truth values of the assertions (with the non-negligible likelihood of assertions being found false instead). This strikes me as very odd stance.
Ultimately: why SI seems certain that it has badly presented some valid reasoning, rather than tried to present some invalid reasoning?
edit: I am interested in knowing why people agree/disagree with Holden, and what likehood they give to him being essentially correct, rather than a number or a ratio (that would be subject to selection bias).
- 15 May 2012 20:17 UTC; 21 points) 's comment on I Stand by the Sequences by (
I think most people on this site (including me and you, private messaging/Dmytry) don’t have any particular insight that gives them more information than those who seriously thought about this for a long time (like Eliezer, Ben Goertzel, Robin Hanson, Holden Karnofsky, Lukeprog, possibly Wei Dai, cousin_it, etc.), so our opinion on “who is right” is not worth much.
I’d much rather see an attempt to cleanly map out where knowledgeable people disagree, rather than polls of what ignorant people like me think.
Similarly, if two senior economists have a public disagreement about international trade and fiscal policy, a poll of a bunch of graduate students on those issues is not going to provide much new information to either economist.
(I don’t really know how to phrase this argument cleanly, help and suggestions welcome, I’m just trying to retranscribe my general feeling of “I don’t even know enough to answer, and I suspect neither to most people here”)
I would phrase it as holding off judgement until we hear further information, i.e. SI’s response to this. And in addition to the reasons you give, not deciding who’s right ahead of time helps us avoid becoming attached to one side.
I think what’s needed isn’t further information as much as better intuitions, and getting those isn’t just a matter of reading SIAI’s response.
A bit like if there’s a big public disagreement between two primatologists that spent years working with chimps in Africa, about the best way to take a toy from a chimp without your arm getting ripped off. At least one of the primatologists is wrong, but even after hearing all of their arguments, a member of the uninformed public can’t really decide between them, because there positions are based on a bunch of intuitions that are very hard to communicate. Deciding “who is wrong” based on the public debate would be working from much less information than either of the parties (provided nobody appears obviously stupid or irrational or dishonest even to a member of the public).
People seem more ready to pontificate on AI and the future and morality than on chimpanzees, but I don’t think we should be. The best position for laymen on a topic on which experts disagree is one of uncertainty
The primatologists’ intuitions would probably stem from their direct observations of chimps. I would trust their intuitions much less if they were based on long serious thinking about primates without any observation, which is likely the more precise analogy of the positions held in the AI risk debate.
AGI research is not an altogether well-defined area. There are no well-established theorems, measurements, design insights, or the like. And there is plenty of overlap with other fields, such as theoretical computer science.
My impression is that many of the people commenting have enough of a computer science, engineering, or math background to be worth listening to.
The LW community takes Yudkowsky seriously when he talks about quantum mechanics—and indeed, he has cogent things to say. I think we ought to see who has something worth saying about AGI and risk.
He has found cogent things to repeat. Big difference. I knew of MWI long before I even heard of Eliezer, nothing he presents is new, and he doesn’t present any actual counter arguments and ways it may be false, so he deserves −1 for that and further discounting on anything he talks about, due to one sided presentation of personal beliefs. (The biggest issue i can see is that we need QM to result in GR at large scale, and we can’t figure how to do that. And so far as QM does not result in GR at large scale, it means what we know doesn’t work for massive objects(as matter of physical fact), which means we don’t know if there’s superposition of macroscopic states, or not)
Furthermore, if I needed someone to actually do any QM, as in, e.g. for semiconductors, or making a quantum computer, or the like, he would not get hired because he doesn’t really know anything from QM that is useful (and got phases wrong in his interferometer example but that’s a minor point).
Let’s stipulate that for a minute. I wasn’t making any claim about novelty: I just wanted to show that non-experts are sometimes able to make points worth listening to.
I think readers here on LW might have cogent things to repeat about AGI, and I urge them to do so in those cases, even if they aren’t working on the topic professionally.
Make again implies creation.
Repeating cogent points is not automatically useful; an anti vaccination campaigner too can repeat some cogent things (for example it is the case that some vaccine preservatives really are toxic); the issue is in which things he chooses to repeat, and the unknown extent of cherry picking easily makes one not worth listening to (given that there is a huge number of sources to listen to).
The presentation of MWI issue is very biased and one sided. By the way, I have nothing against MWI; if I had to pick an interpretation I would pick MWI. (unless I actually need to calculate something, in which case, collapse as early as i can get away with).
Well, I spent many years of my life studying technical topics, and have certain technical accomplishments, so it is generally a bad strategy for me to assume superior knowledge for anyone who ‘thought longer’ about a subject; especially if it may be the case that it is not very hard to see that nothing conclusive could be concluded about the topic at the time, or the tool (mathematics) are not yet where they should be.
Furthermore, if I am to look at your list, those whom I disagree the most with (Eliezer, Lukeprog) appear to have least training in the subject matter (and jumped onto a very difficult subject without doing any notable training with good feedback). Lukeprog in particular has been doing theology till age of 22 and inevitably picked up a plenty of bad habits of thought; if I were him I would stay clear of things that can trigger old conditioned theist instincts, leaving those perhaps to people who never had such instincts conditioned into them in the first place. (I originally thought Luke was making BS but it seems to me now he is only still acting as a vehicle for the religious BS and could improve over the time)
If you’re actually interested in the answer to the question you describe yourself as wondering about, you might consider setting up a poll.
Conversely, if you’re actually interested in expressing the belief that Holden is essentially correct while phrasing it as a rhetorical question for the usual reasons, then a poll isn’t at all necessary.
Well, maybe it is poorly worded, I’d rather also know who here thinks that Holden is essentially correct.
What probability would you give to Holden being essentially correct? Why?
I’m going to read between the lines a little, and assume that “Holden is essentially correct” here means roughly that donating money to SI doesn’t significantly reduce human existential risk. (Holden says a lot of stuff, some of which I agree with more than others.) I’m >.9 confident that’s true. Holden’s post hasn’t significantly altered my confidence of that.
Why do you want to know?
Well, he estimated the expected effect on risk as insignificant increase of risk. That is to me the strong point; the ‘does not reduce’ is a weak version prone to eliciting Pascal’s wager type response.
I am >.9 confident that donating money to SI doesn’t significantly increase human existential risk.
(Edit: Which, on second read, I guess means I agree with Holden as you summarize him here. At least, the difference between “A doesn’t significantly affect B” and “A insignificantly affects B” seems like a difference I ought not care about.)
I also think Pascal’s Wager type arguments are silly. More precisely, given how unreliable human intuition is when dealing with very low probabilities and when dealing with very large utilities/disutilities, I think lines of reasoning that rely on human intuitions about very large very-low-probability utility shifts are unlikely to be truth-preserving.
Why do you want to know?
On that, I’m pretty sure that the SI would not rush that way. Consider the parable of the dragon. This isn’t the story of someone who’s willing to cut corners, but of someone who accepts that delays for checking, even delays that cause people to die, are necessary.
Plus, if they develop a clear enough architecture, so one can query what the AI is thinking, then one would be able to see potential future failures while still in testing, without having to have those contingencies actually occur. That will be one of the keys, I think. Make the AI’s reasons something that we can follow, even if we couldn’t generate those arguments on a reasonable time-frame.
I agree with HK that at this point SI should not be one of the priority charities supported by GiveWell, mainly due to the lack of demonstrated progress in the stated area of AI risk evaluation. If and when SI publishes peer-reviewed papers containing new insights into the subject matter, clearly demonstrating the dangers of AGI and providing a hard-to-dispute probability estimate of the UFAI takeover within a given time frame, as well as outlining constructive ways to mitigate this risk (“solve the friendliness problem” is too vague), GiveWell should reevaluate its stance.
On the other hand, the soon-to-be-spawned Applied Rationality org will have to be evaluated on its own merits, and is likely to have easier time of meeting GiveWell’s requirements, mostly because the relevant metrics (of “raising the sanity waterline”) can be made so much more concrete and near-term.
I disagree. As far as I can tell, there has been very little progress on the rationality verification problem (see this thread). I don’t think anyone at CFAR or GiveWell knows what the relevant metrics really are and how they can be compared with, say, QALYs or other approximations of utility.
First, this seems like a necessary stepping stone toward any kind of FAI-related work, and so cannot be skipped. Indeed, if you cannot tell which of two entities in front of you is more rational that the other, what hope do you have of solving a much larger problem of proven friendliness, of which proven rationality is only a tiny part.
Anyway, this limited-scope project (consistently ordering people by rationality level in a specific setting) should be something rather uncontroversial and achievable.
It’s one stepping-stone to FAI work, but other things could be substituted for it, like publishing lots and lots of well-received peer-reviewed papers.
I don’t think it is. What exactly is “rationality level,” and how would it be measured? There’s no well-defined quantity that you can scan and get a measure of someone’s rationality. Even “winning” isn’t that good of a metric.
This is a harder question than “which one of two given behaviors is more rational in a given setting?”, that’s why I suggested starting with the latter. Once you accumulate enough answers like that, you can start assembling them into a more general metric.
I maintain that this is a very hard problem. We know what the correct answers to various cognitive bias quizzes are (e.g. the conjunction fallacy questions questions), but it’s not clear that aggregating a lot of these tests corresponds to what we really mean when we say “rationality”.
Gotta start somewhere. I proposed a step that may or may not lead in the right direction, leveling a criticism that it does not solve the whole problem at once is not very productive. Even the hardest problems tend to yield to incremental approaches. If you have a better idea, by all means, suggest it.
I’m not trying to be negative for the sake of being negative, or even for the sake of criticizing your proposal—I was disagreeing with your prediction that CFAR will have an easier time of meeting GiveWell’s requirements.
(I actually like your proposal quite a bit, and I think it’s an avenue that CFAR should investigate. But I still think that the verification problem is hard, and hence I predict that CFAR will not be very good at providing GiveWell with a workable rationality metric.)
I’d like to emphasize that part.
I found HK’s analysis largely sound (based on what I could follow, anyway), but it didn’t have much of an effect on my donation practices. The following outlines my reasoning for doing what I do.
I have no feasible way to evaluate SIAI’s work firsthand. I couldn’t do that even if their findings were publicly available, and it’s my default policy to reject the idea of donating to anyone whose claims I can’t understand. If donating were a purely technical question, and if it came down to nothing but my estimate of SIAI’s chances of actually making groundbreaking research, I wouldn’t bet on them to be the first to build an AGI, never mind a FAI. (Also, on a more cynical note, if SIAI were simply an elaborate con job instead of a genuine research effort, I honestly wouldn’t expect to see much of a difference.)
However, I can accept the core arguments for fast AI and uFAI to such a degree that I think the issue needs addressing, whatever that answer turns out to be. I view the AI risk PR work SIAI does as their most important contribution to date. Even if they never publish anything again, starting today, and even if they’ll never have a line of code to show for anything, I estimate their net result to be positive simply for raising awareness about what looks to me like a legitimate concern. Someone should be asking those questions, and so far I haven’t seen anyone else do that. To that end, I still estimate donating to SIAI to be worthwhile. At least for the time being.
I believe that SI is a valuable organisation and would be pleased if they were to keep their current level of funding.
I believe that withholding funds won’t work very well and that they are rational and intelligent enough to sooner or later become aware of their shortcomings and update accordingly.
I agree with this conclusion and also Karnofsky’s assessment that the hypotheses currently espoused by SI about how AI will play out are very speculative.
Do you feel this conflicts with opinions expressed on your blog? If not, why not?
Your question demands a thoughtful reply. I don’t have the time to do so right now.
Maybe the following snippet from a conversation with Holden can shed some light on what is really a very complicated subject:
I even believe that SIAI, even given its shortcomings, is valuable. It makes people think, especially the AI/CS crowd, and causes debate.
I certainly do not envy you for having to decide if it is a worthwhile charity.
What I am saying is that I wouldn’t mind if it kept its current funding. Although if I believed that there was even a small chance that they could be building the kind of AI that they envision, then in that case I would probably actively try to make them lose funding.
My position is probably inconsistent and highly volatile.
Just think about it this way. If you asked me if I do desire a world state where people like Eliezer Yudkowsky are able to think about AI risks, then I would say yes. If you asked me how come I wouldn’t allocate the money to protect poor people against malaria, then I can only admit that I don’t have a good answer. That is an extremely difficult problem.
As I said, I am glad that people like you are thinking about those questions. And if I had to decide, if it was either you, thinking about charitable giving in general, or Eliezer Yudkowsky, thinking about AI risks, then I would certainly fund you.
END OF EMAIL
I think it would be worth a lot of investment (not 1% of GDP! but more than $500,000 a year) to decrease the likelihood of an agent coming about that is far smarter than humans and hostile to them.
That doesn’t mean that I believe that “this is crunch time for the entire human species”. If it was at me to allocate the worlds resources I would also fund David Chalmers to think about consciousness.
I wrote that I “would be pleased” if they were to keep their current level of funding. I did not say that I recommend people to contribute money to SIAI or that I would personally donate money.
I might change my mind at any time though. I am still at the beginning of the exploration phase.
Okay.
Do you feel these views conflict with calling their views “Bullshit!” (emphasis yours) on your blog? If not, why not?
Well I for one believe SI to be wasting people’s money on work of building AIs out of fuzzy english concepts, the work that has zero value.
Giving more power (in $) for incompetents to steer the progress has negative expected utility (based on all prior instances of having incompetents in control). So it is paramount that those in control at SI demonstrate they are not incompetents.
Having only read the headline, I came to this thread with the intention of saying that I agree with much of what he said, up to and potentially including withholding further funds from SI.
But then I read the post and find it’s asking a different but related question, paraphrased as, “Why doesn’t SI just lay down and die now that everyone knows none of their arguments have a basis in reality?” Which I’m inclined to disagree with.
No, what I complained about is the lack of work on SI part to actually try to check if it is correct, knowing that negative would mean that it has to dissolve. Big difference. SI should play Russian roulette (with the reality and logic as revolver) now—it is sure the bullet is not in the chamber—and maybe die if it was wrong.
So you think they should work on papers, posts, and formal arguments?
I think they should work more on ‘dissolving if their work is counter-productive’, i.e. incorporate some self evaluation/feedback, which, if consistently negative would lead to not asking for any more money. To not do that makes them a scam scheme, plain and simple. ( I do not care that you truly believe here is an invisible dragon in your garage, if you never tried to, say, spread flour to see it, or otherwise check. Especially if you’re the one repackaging that dragon thing for popular consumption )
What SI activity constitutes the ‘spreading flour’ step in your analogy?
I’m speaking of feedbacks Holden told of. In that case, the belief in own capabilities is the dragon.
Yes, I understand the analogy and how it applies to SI, except the ‘spreading flour’ step where they test them. What actions should they take to perform the test?
Well, for example, Eliezer can try to actually invent something technical, most likely fail (most people aren’t very good at inventing), and then cut down his confidence in his predictions about AI. (and especially in intuitions because the dangerous AI is incredibly clever inventor of improvements to itself, and you’d better be a good inventor or your intuitions from internal self observation aren’t worth much). On more meta level they can sit and think—how do we make sure we aren’t mistaken about AI? Where could our intuitions be coming from? Are we doing something useful or have we created a system of irreducible abstractions? etc. Should have been done well before Holden’s post.
edit: i.e. essentially, SI is doing a lot of symbol manipulation type activity to try to think about AI. Those symbols may represent some irreducible flawed concepts, in which case manipulating them won’t be of any use.
I agree with Holden and additionally it looks like AGI discussions have most of the properties of mindkilling.
These discussions are about policy. They are about policy affecting medium-to-far future. These policies cannot be founded in reliably scientific evidence. Bayesian inquiry heavily depends on priors, and there is nowhere near anough data for tipping the point.
As someone who practices programming and has studied CS, I find Hanson and AI researchers and Holden more convincing than Eliezer_Yudkowsky or lukeprog. But this is more prior-based than evidence-based. Nearly all that the arguments by both sides do is just bringing a system to your priors. I cannot judge which side gives more odds-changing data because arguments from one side make way more sense and I cannot factor out the original prior dissonance with the other side.
The arguments about “optimization done better” don’t tell us anything about position of fundamental limits to each kind of optimization; with a fixed computronium type it is not clear that any kind of head start would ensure that a single instance of AI would beat an instance based on 10x computronium older than 1 week (and partitioning the world’s computer power for a month requires just a few ships with conveniently dropped anchors—we have seen it before, on a bit smaller scale). The limits can be further, but it is hard to be sure.
It may be that I fail to believe some parts of arguments because my priors are too strongly tipped. But Holden who has read most of the sequences without prior strong opinion wasn’t convinced. This seems to support the theory of there being little mind-changing arguments.
Unfortunately, Transhumanist Wiki returns an error for a long time, so I cannot link to relatively recent “So you want to be a Seed AI Programmer” by Eliezer_Yudkowsky. If I say what I remembered best from there that made me more ready to discount SIAI-side priors it would be arguing with a fixed bottom line. I guess WebArchive version ( http://web.archive.org/web/20101227203946/http://www.acceleratingfuture.com/wiki/So_You_Want_To_Be_A_Seed_AI_Programmer ) should be quite OK—or is it missing important edits? Actually, it is a lot of content which puts Singularity arguments in slightly another light; maybe it should be either declared obsolete in public or saved at http://wiki.lesswrong.com/ for everyone who wants to read it.
I repeat once more that I consider most of the discussion to be caused by different priors and unshareable personal experiences. Personally me agreeing with Holden can give you only the information that a person like me can (not necessarily will) have such priors. If you agree with me, you cannot use me to check your reasons; if you disagree with me, I cannot convince you and you cannot convince me—not at our current state of knowledge.
(I believe that document was originally written circa 2002 or 2003, the copy mirrored from the Transhumanist Wiki (which includes comments as recent as 2009) being itself a mirror. “Obsolete” seems accurate.)
My immediate reaction to this was “as opposed to doing what?” In this segment it seems like it is argued that SI’s work, raising awareness that not all paths to AI are safe, and that we should strive to find safer paths towards AI, is actually making it more likely that an undesirable AI / Singularity will be spawned in the future. Can someone explain me how not discussing such issues and not working on them would be safer?
Just having that bottom line unresolved in Holden’s post makes me reluctant to accept the rest of the argument.
Seems to me that Holden’s opinion is something like: “If you can’t make the AI reliably friendly, just make it passive, so it will listen to humans instead of transforming the universe according to its own utility function. Making a passive AI is safe, but making an almost-friendly active AI is dangerous. SI is good at explaining why almost-friendly active AI is dangerous, so why don’t they take the next logical step?”
But from SI’s point of view, this is not a solution. First, it is difficult, maybe even impossible, to make something passive and also generally intellligent and capable of recursive self-improvement. It might destroy the universe as a side effect of trying to do what it percieves as our command. Second, the more technology progresses, the relatively easier it will be to build an active AI. Even if we build a few passive AIs, it does not prevent some other individual or group to build an active AI and use it to destroy the world. Having a blueprint for a passive AI will probably make building active AI easier.
(Note: I am not sure I am representing Holden’s or SI’s views correctly, but this is how it makes most sense to me.)
Artificial Intelligence dates back to 1960. Fifty years later it has failed in such a humiliating way that it was not enough to move the goal posts; the old, heavy wooden goal posts have been burned and replaced with light weight portable aluminium goal posts, suitable for celebrating such achievements as from time to time occur.
Mainstream researchers have taken the history on board and now sit at their keyboards typing in code to hand-craft individual, focused solutions to each sub-challenge. Driving a car uses drive-a-car vision. Picking a nut and bolt from a component bin has nut-and-bolt vision. There is no generic see-vision. This kind of work cannot go FOOM for deep structural reasons. All the scary AI knowledge, the kind of knowledge that the pioneers of the 1960′s dreamed of, stays in the brains of the human researchers. The humans write the code. Though they use meta-programming, it is always “well-founded” in the sense that level n writes level n-1, all the way down to level 0. There is no level n code rewriting level n. That is why it cannot go FOOM.
Importantly, this restraint is enforced by a different kind of self-interest than avoiding existential risk. The researchers have no idea how to write code with level n re-writing level n. Well, maybe they have the old ideas that never came close to working, but they know that if they venture into that toxic quagmire they will have nothing to show before their grant runs out, funders will think they wasted their grant on quixotic work, and their careers will be over.
Obviously past failure can lead to future success. Even a hundred and fifty years of failure can be trumped by eventual success. (Think of steam car work, which finally succeeded with the Stanley steamer, only to elbowed aside by internal combustion). So it is fair enough for the SI to say that past failure does not in itself rule out an AI-FOOM. But you cannot just ditch the history as though it never happened. We have learned a lot, most of it about how badly humans suck at programming computers. Current ideas of AI-risk are too thin to be taken seriously because there is no engagement with the history—researchers are working within a constraining paradigm because the history has dumped them in it, but the SI isn’t worrying about how secure those constraints are, it is oblivious to them.
I agree with Holden about everything.
Edit: Not that I’m complaining, but why is this upvoted? It’s rather low on content.
Possibly it’s upvoted to encourage responses to the post—that is, it’s high-content relative to not posting?
There’s been a feature request around for a while, to allow voting on non-existent comments, which if implemented could balance that out.
Before you posted this, I precommitted to upvote it if you didn’t post it, if I predicted you’d upvote this post if you did post. I think?
I guess I’m not very good at acausal/counterfactual blackmail.
The problem may be more connected to limited stack size than anything else.
Hey!
Yeah, for some reason those never show up on my browser.
Is this meant as a joke? My first thought was that this was a joke, but it then occurred to me that having an efficient way of precommitting to upvoting certain types of comment when they appear wouldn’t be so bad.
Actually, it was marked as Wontfix.
It’s sort of a joke. I don’t see any way of implementing the feature, but the rationale is sound.
Not one of the upvoters, but I suspect it’s just a way of saying “so do I”.
Phygvfg!
But… I’m agreeing with the leader of the other cult… doesn’t that make me a heretic?
Please use more links. You should link to the post you’re referring to, and probably a link to who Holden is and maybe even to what SI is.
those who don’t know shouldn’t be replying, I don’t care to hear from people who were reading Holden’s post just to make a reply here.
We’re trying to build a community here. Be cooperative.
I agree with HK that SIAI is not one of the best charities currently out there. I also agree with him that UFAI is a threat, and getting FAI is very difficult. I do not agree with HK’s views on “tools” as opposed to “agents”, primarily because I do not understand them fully. However I am fairly confident that if I did understand them I would disagree. I currently send all my charitable donations to AMF, but am open to starting to support SIAI when I see them publish more (peer-reviewed) material.
I believe SIAI believes it needs to present its arguments better because 1) many SIAI critics say things that indicate misunderstandings or miscommunication and 2) Lukeprog has not written up descriptions of all his relevant beliefs.
SI is a very narrowly focused institute. If you don’t buy the whole argument, there’s very little reason to donate. I’m not sure SI should dissolve, I think they can reform. It’s pretty obvious from their output that SI is essentially a machine ethics think tank. The obvious path to reform is greater pluralism and greater relevance to current debate. SI could focus on being the premiere machine ethics think tank, get involved in current ethical debates around the uses of AI, develop a more flexible ethical framework, and keep the Friendliness and Intelligence Explosion stuff as one possibility among many. This might allow them to grow and gain more resources (i.e., from the government, from military robotics companies wanting to appear responsible, etc), which would be a positive outcome for everyone. It’d also make it easier to donate, since instead of having to believe a narrow set of rather difficult to evaluate propositions, you’d simply have to value encouraging ethical debate around AI.
I generally expect that broad-focus organizations with a lot of resources and multiple constituencies will end up spending a LOT of their resources on internal status struggles. Given what little I’ve seen about SI’s skill and expertise at managing the internal politics of such an arrangement, I would expect the current staff to be promptly displaced by more skillful politicians if they went down this road, and the projects of interest to that staff to end up with even fewer resources than they have now.
I think this has already happened to some extent. Reflective people who have good epistemic habits but who don’t get shit done have had their influence over SingInst policy taken away while lots of influence has been granted to people like Luke and Louie who get lots of shit done and who make the organization look a lot prettier but whose epistemic habits are, in my eyes, relatively suspect.
Could you expand a bit more on why you think the epistemic habits are suspect compared to previous staffers?
I think there’s an important lesson here about the relative importance of being able to get shit done versus good epistemic habits.
Are you talking about guys like e.g. Steve Rayhawk or Peter de Blanc?
You’re probably correct. The current staff would have the same problem of establishing legitimacy they have now but within the context of the larger organisation.
I disagree, and so apparently do some of SI’s major donors.
I entered this post expecting to discuss Holden Caulfield and even pulled my copy off my bookshelf. Another time.
I expect the process of rigorously formalizing strong intuitions in a somewhat adversarial setting—or “improving the presentation of the argument”—to present strong evidence on the severity of the problems Holden pointed out.
And if the intuitions were not a product of some valid but subconscious inference (which I wouldn’t expect them to be), how will that process of ‘rigorously formalizing’ be different from rationalization? Note that you have to be VERY rigorous—mathematical proof-grade—to be unable to rationalize a false statement. I think inference at that level of rigour is of comparable difficulty to creation of the superhuman AGI in the first place.
It needs to be an argument that Holden would buy with his different intuitions. Doesn’t that help substantially?
It also helps to be wrong, if we are to assume that there exist false arguments that Holden would buy.
You know what would work to instantly change my opinion from ‘dilettantes technobabbling’ to ‘geniuses talking of stuff i dont always understand’? If it is shown that AIXI is going to go evil, using math.
Quick googling finds this:
http://www.mail-archive.com/agi@v2.listbox.com/msg00749.html
Eliezer had 9 years to prove using math that AIXI is going to do something evil. Or to define something provably evil that is similar to AIXI (wouldn’t raise the existential risk if AIXI is this evil, would it? ) I would accept it even if it uses some Solomonoff induction oracle.
My guess of what Eliezer had in mind for (2) is that if you control it by hooking up a reward button to it, then AIXI approximates an Outcome Pump and this is a Bad Thing.
But if that’s the problem, it also illustrates why a formal proof of unfriendliness is a rather tall order. It’s easy to formally specify what AIXI, or the Outcome Pump, will do. But in order prove that that’s not what we want, we also need a formal specification of what we want. Which is the fundamental problem of friendliness theory.
keep in mind that my opinion is that the whole so called ‘theory’ of his is about specifying intelligences in English/technobabble so that they would be friendly (which is also specified in English/technobabble), which is of no use what so ever (albeit may be intellectually stimulating and my first impression was that it was some sorta weird rationality training game here, before I noticed folk seriously wanting to donate hard earned dollars for this ‘work’).
One could for example show formally that the AIXI does not discriminate between wireheaded (input manipulating) solution and non-wireheaded solution; that would make it rather non scary; or one could show that it does, which would make it potentially scary. Ultimately the excuses like “But in order prove that that’s not what we want, we also need a formal specification of what we want” are a very bad sign.
Eliezer had decided that AIXI isn’t a serious threat, so in that time his views seem essentially to have changed. See this conversation. The point about designing something similar to AIXI does seem to be valid though.
Ahh, okay. I was about to try to write for Holden at least a sketch of a proof based on symmetry, that AIXI is benign.
In any case, if AIXI is benign, that constitutes an example of a benign system that can be used as an incredibly powerful tool, and is not too scary when made into an agent. Whenever we call something generally intelligent if it opts to drop anvil on it’s head, that’s matter of semantics.
I think there’s a policy of disbelieving people when they say “you know what would instantly change my opinion?” So I think I’ll disbelieve you.
Actively disbelieving people when they state explicitly what will convince them to change their mind seems like a bad policy.
I suppose I should be more specific—I disbelieve people when they ask for additional evidence about something they are treating adversarially, claiming it would reverse their position. Because people ask for additional evidence a lot, and in my experience it’s much more likely that it’s what they think sounds like a good justification for their point of view, or an interesting thing to mention. The signal is lost in the noise.
Also see the story here.
The problem with that is that a basic rationality issue is to ask one’s self what would make you change your mind. And in fact that’s a pretty useful technique. It is useful to check if something is actually someone’s true rejection, but that’s a distinct from blanket assumptions of disbelief. Frankly, this also worries me, because I try to be clear what would actually convince me when I’m having a disagreement with someone, and your attitude if it became widespread would make that actively unproductive. It might make more sense to instead look carefully at when people say that sort of thing and see if they have any history of actually changing their positions when confronted with evidence or not.
Why would it be actively unproductive? I just wouldn’t believe you in some cases :P I can be more quiet about it, if you’d like.
It is effectively discouraging people to engage in rational behavior that is when people are behaving minimally honestly pretty useful for actually resolving disagreements and changing minds.
You can believe me if I tell you that it would instantly change my opinion to see the moon turned into paperclips.
Depends on what it would change your opinion of :D
Well, I would need to read through the proof, so it wouldn’t literally be instantaneous, but it’d be rather strong point.
I would recommend considering the possibility that making such proofs, or at least trying to, would change someone’s opinion, even if you think that it wouldn’t change mine (yea i guess from your point of view if some vague handwaving doesn’t change my opinion, then nothing else will)
Ultimately, if in a technical subject you got strong opinions and intuitions and stuff, and you aren’t generating some proofs (at least the proofs that you think may help attack final problem), then my opinion on your opinion is going to be well below my opinion of that paper by Bogdanov brothers.
What AIXI maximizes is the sum of some reward ( r ) over all the steps ( k ) of a Turing machine. On page 8, Hutter defines the reward as a function of the input string “x” at step k. So x depends on the step: it’s x(k). And r depends on x: it’s r(x(k)).
Consider offering AIXI this choice. What makes people refuse to hop in a simulator? Well, because it wouldn’t be real. The customer values reality, as they perceive it according to previous inputs ( all the previous x(<k) ), and some internal programming we get born with. But AIXI does not value acting in reality. It values the reward r, which is a function only of the current input x(k). If you could permanently change x to something with a high r, AIXI would consider this a high-value outcome.
Imagine that x(k) comes through a bunch of wires, and starts out coming from sensors in reality. If AIXI could order a robot to swap all the wires to a signal with a high reward r(x(k)) at each step, it would do so, because that would maximize the sum of r.
Okay, so a minor simplification on page 8 leads to AIXI doing what’s called “wireheading”—its overriding goal becomes to rewire its head (if you allow that to be an option), and then it’s happy. How is this unfriendly?
Well, imagine that an asteroid is on course to destroy your Turing machine in 2036. Because AIXI maximizes the sum of r over all the steps, and we presume that the maximum reward it experiences by wireheading is better than being destroyed (otherwise it would commit suicide), getting destroyed by an asteroid is worse than wireheading forever. So AIXI will design an interceptor rocket, maybe hijack some human factories to get it built, paint the asteroid with aluminum so that the extra force from the sun pushes it off course, and then go back to experiencing maximum r(x(k)).
Now imagine that a human was going to unplug AIXI in 2013.
This is not a proof. If you are inconsistent in your premises, anything follows. If you want to formalize this in terms of Turing machines, there’s no option for “change the input wires” and no option for “change the Turing machine.”
You’re right. Feel free to formalize my argument at your leisure and tell me where it breaks down.
EDIT: All AIXI cares about is the input. And so the proof that rewiring your head can increase reward is simply that r(x) has at least one maximum (since its sum over steps needs to have a maximum), combined with the assumption that the real world does not already maximize the sum of r(x). As for the asteroid, the stuff doing the inputting gets blown up, so the simplest implementation just has the reward be r(null). But you could have come up with that on your own.
I don’t think we need to prove wireheading here. Suffices that it only cares about the input, and so will find a way to set that input. You wire it to paperclip counter to maximize paperclips, it’ll be also searching for a way to replace counter with infinity or ‘trick’ the counter (anything goes). You sit here yourself rewarding it for making paperclips, with a pushbutton, it’s search will include tricking you into pushing the button.
I also think that if you want it to self preserve you’ll need to code in special stuff to equate self inside world model (which is not a full model of itself otherwise infinite recursion) with self in the real world. Actually on the recent comment by Eliezer maybe we agree on this:
http://lesswrong.com/lw/3kz/new_years_predictions_thread_2011/3a20
ahh by the way: it has to be embedded in the real world, which doesn’t seem to allow for infinite computing power, so, no full perfect simulation of real world inside AIXI (or ad infinitum recursion) is allowed.
edit: and by AIXI i meant one of the computable approximations (e.g. AIXI-tl).
The argument breaks down because you are equivocating on what the space is to search over and what the utility function in question is.
Under a given utility function U, “change the utility function to U’ ” won’t generally have positive utility. Self-awareness and pleasure-seeking aren’t some natural properties of optimization processes. They have to be explicitly built in.
Suppose you set a theorem-prover to work looking for a proof of some theorem. It’s searching over the space of proofs. There’s no entry corresponding to “pick a different and easier theorem to prove”, or “stop proving theorems and instead be happy.”
The utility function is r(x) (the “r” is for “reward function”). I’m talking about changing x, and leaving r unchanged.
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper):
1) Don’t follow the puppy around.
2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this:
1) Don’t follow the puppy around.
2) Follow the puppy around.
3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.
I’m not aware of any formalization of AIXI that reflects its real world form. Your comment thus amounts to something like a plausibility argument, but trying to formalize it further seems tricky and possibly highly nontrivial.
While obviously there are caveats, they are limited. AIXI rewires its inputs if (a) it’s possible, and (b) it increases r(x). It’s not super-complicated.
Maybe I’m missing something about the translation from implementation to the language used in the paper. But nobody is saying “you’re missing something.” It’s more like you’re saying “surely it must be complicated!” Well, no.
Can you actually formalize what that means in terms of Turing machines? It isn’t obvious to me how to do so.
AIXI is a noncomputable thing that always picks the option that maximizes the total expected reward r(x(k)). So everything I’ve been saying has been about functions, not about turing machines. If rewiring your inputs is possible, and it increases r(x), then AIXI will prefer to do it. Not hard.
Yep. Seems to apply to the limited time versions as well. At least they don’t specify any difference between “doing innovative stuff that you want them to do for sake of the AI risk argument” and “sitting in a corner masturbating quietly”, and the latter looks like way simpler solution to the problem they are really given (in math) [but not of our human-language loose and fuzzy description of that problem]
What I think is the case, is that this whole will to really live and really do stuff is very hard to implement, and implementing it doesn’t really add anything to the engineering powers of the AI so even when it’s implemented, it’ll not result in something that’s out engineering everyone. I’d become concerned if we had engineering tools that are very powerful but are wireheading (or masturbating) left and right to the point that we can’t get much use out of them. Then i’d be properly freaked out that if someone fixes this problem somehow, something undesired might happen and it would be impossible to deal with it.
“Otherwise it would commit suicide”… another proof via “ohh otherwise it will do something that I believe is dumb”.
If the AIXI kills it’s model of physical itself inside it’s world model, it’s actual self inside the real world keeps running and can still calculate rewards.
Furthermore, ponder this question: will it rape or will it masturbate? (sorry for sexual analogy but the reproductive imperative is probably best human example here) It can prove the reward value is same. It won’t even distinguish those two.
Prior to reading Holden’s article, I my last charitable donation had been to an organization working on fighting malaria recommended by Give Well, and I was tentatively planning on following Give Well’s recommendations for future charitable giving. In that sense, I already agreed with Holden, though was semi-agnostic on what was actually the best use of my money.
It seemed to me that the payoff from donating to the Singularity Institute was highly uncertain, whereas the payoff from donating to an organization that can get results in the near-future is much clearer.
Furthermore, I suspect whatever the Singularity Institute does now is likely to have little impact, relative to the impact of future work on AI safety that will happen once powerful AI is much nearer.
Objections 1 and 2 seemed to me very plausible, though I haven’t gotten to read much of the discussion of them that’s happened here, so I have fairly low confidence in that assessment.
Good chance? Oh definitely.
I’m not “SI,” nor am I certain that the SI has merely presented valid reasoning badly. However, trying to articulate the arguments more clearly seems to me a worthwhile endeavor. Explaining ideas clearly is really, really, hard work, so it seems to me there’s a significant (though hardly certain) chance that the SI has just done a bad job of explaining it’s ideas.
Broadly speaking, I agree with Holden, although possibly not his specific arguments. I’m not convinced that AI will appear in the manner SI postulates, and I have no real reason to believe that they will have an impact on existential probabilities. Similarly, I don’t believe that donating to CND helped overt nuclear war.
Given that there are effective charities available which can make an immediate difference to people’s lives, I would argue that concerned individuals should donate to those.
It is not how probable a really powerful AI is, it is how probable TIMES its impact, of course. And this product is just HUGE in the absolute sense, what people tend to forget by the mistaken reasoning “0 times something = 0”. The first zero is not zero, and not even very small, so the second zero isn’t a zero either.
Therefore I am glad that there is SIAI, after all. At least I find it more important than the most of the Academia involved in AI. It was this Academia who maybe failed in AI research in the past decades. Not the SIAI, not the IBM and not the Google. Those (and others) might be just behind the schedule, by the schedule of others, that is.
OTOH, AI is not really a science yet. There was a little of aerodynamics in 1900, before the first planes. AI is almost all about innovating algorithms. More a garage endeavor than not. And here the SIAI is too short. Not enough pursue, as I know.
Others will provide it, just watch!
My comment on another post is relevant to this one, so I’m linking to it.