Seeing this post get so strongly upvoted makes me feel like I’m going crazy.
This is not the kind of content I want on LessWrong. I did not enjoy it, I do not think it will lead me to be happier or more productive toward reducing x-risk, I don’t see how it would help others, and it honestly doesn’t even seem like a particularly well done version of itself.
For whatever it is worth, this post along with reading the unworkable alignment strategy on the ELK report has made me realize that we actually have no idea what to do and has finally convinced me to try to solve alignment, I encourage everyone else to do the same. For some people knowing that the world is doomed by default and that we can’t just expect the experts to save it is motivating. If that was his goal, he achieved it.
Certainly for some people (including you!), yes. For others, I expect this post to be strongly demotivating. That doesn’t mean it shouldn’t have been written (I value honestly conveying personal beliefs and are expressing diversity of opinion enough to outweigh the downsides), but we should realistically expect this post to cause psychological harm for some people, and could also potentially make interaction and PR with those who don’t share Yudkowsky’s views harder.
Despite some claims to the contrary, I believe (through personal experience in PR) that expressing radical honesty is not strongly valued outside the rationalist community, and that interaction with non-rationalists can be extremely important, even to potentially world-saving levels. Yudkowsky, for all of his incredible talent, is frankly terrible at PR (at least historically), and may not be giving proper weight to its value as a world-saving tool.
I’m still thinking through the details of Yudkowsky’s claims, but expect me to write a post here in the near future giving my perspective in more detail.
I don’t think “Eliezer is terrible at PR” is a very accurate representation of historical fact. It might be a good representation of something else. But it seems to me that deleting Eliezer from the timeline would probably result in a world where far far fewer people were convinced of the problem. Admittedly, such questions are difficult to judge.
I think “Eliezer is bad at PR” rings true in the sense that he belongs in the cluster of “bad at PR”; you’ll make more correct inferences about Eliezer if you cluster him that way. But on historical grounds, he seems good at PR.
Eliezer is “bad at PR” in the sense that there are lots of people who don’t like him. But that’s mostly irrelevant. The people who do like him like him enough to donate to his foundation and all of the foundations he inspired.
It’s the people who don’t like him (and are also intelligent and in positions of power), which I’m concerned with in this context. We’re dealing with problems where even a small adversarial group can do a potentially world-ending amount of harm, and that’s pretty important to be able to handle!
My personal experience is that the people who actively dislike Eliezer are specifically the people who were already set on their path; they dislike Eliezer mostly because he’s telling them to get off that path.
I could be wrong, however; my personal experience is undoubtedly very biased.
I’ll tell you that one of my brothers (who I greatly respect) has decided not to be concerned about AGI risks specifically because he views EY as being a very respected “alarmist” in the field (which is basically correct), and also views EY as giving off extremely “culty” and “obviously wrong” vibes (with Roko’s Basilisk and EY’s privacy around the AI boxing results being the main examples given), leading him to conclude that it’s simply not worth engaging with the community (and their arguments) in the first place. I wouldn’t personally engage with what I believe to be a doomsday cult (even if they claim that the risk of ignoring them is astronomically high), so I really can’t blame him.
I’m also aware of an individual who has enormous cultural influence, and was interested in rationalism, but heard from an unnamed researcher at Google that the rationalist movement is associated with the alt-right, so they didn’t bother looking further. (Yes, that’s an incorrect statement, but came from the widespread [possibly correct?] belief that Peter Theil is both alt-right and has/had close ties with many prominent rationalists.) This indicates a general lack of control of the narrative surrounding the movement, and likely has directly led to needlessly antagonistic relationships.
The problems are well known. The mystery is why the community doesn’t implement obvious solutions. Hiring PR people is an obvious solution. There’s a posting somewhere in which Anna Salamon argues that there is some sort of moral hazard involved in professional PR, but never explains why, and everyone agrees with her anyway.
If the community really and literally is about saving the world, then having a constant stream of people who are put off, or even becoming enemies is incrementally making the world more likely to be destroyed. So surely it’s an important problem to solve? Yet the community doesn’t even like discussing it. It’s as if maintaining some sort of purity, or some sort of impression that you don’t make mistakes is more important than saving the world.
If the community really and literally is about saving the world, then having a constant stream of people who are put off, or even becoming enemies is incrementally making the world more likely to be destroyed. So surely it’s an important problem to solve? Yet the community doesn’t even like discussing it. It’s as if maintaining some sort of purity, or some sort of impression that you don’t make mistakes is more important than saving the world.
I think there are two issues.
First, some of the ‘necessary to save the world’ things might make enemies. If it’s the case that Bob really wants there to be a giant explosion, and you think giant explosions might kill everyone, you and Bob are going to disagree about what to do, and Bob existing in the same information environment as you will constrain your ability to share your preferences and collect allies without making Bob an enemy.
Second, this isn’t an issue where we can stop thinking, and thus we need to continue doing things that help us think, even if those things have costs. In contrast, in a situation where you know what plan you need to implement, you can now drop lots of your ability to think in order to coordinate on implementing that plan. [Like, a lot of the “there are too much PR in EA” complaints were specifically about situations where people were overstating the effectiveness of particular interventions, which seemed pretty poisonous to the project of comparing interventions, which was one of the core goals of EA, rather than just ‘money moved’ or ‘number of people pledging’ or so on.]
That said, I agree that this seems important to make progress on; this is one of the reasons I worked in communications roles, this is one of the reasons I try to be as polite as I am, this is why I’ve tried to make my presentation more adaptable instead of being more willing to write people off.
First, some of the ‘necessary to save the world’ things might make enemies. If it’s the case that Bob really wants there to be a giant explosion, and you think giant explosions might kill everyone, you and Bob are going to disagree about what to do, and Bob existing in the same information environment as you will constrain your ability to share your preferences and collect allies without making Bob an enemy.
So...that’s a metaphor for “telling people who like building AIs to stop building AIs pisses them off and turns them into enemies”. Which it might, but how often does that happen? Your prominent enemies aren’t in that category , as far as I can see. David Gerard,for instance, was alienated by a race/IQ discussion. So good PR might consist of banning race/IQ.
Also, consider the possibility that people who know how to build AIs know more than you, so it’s less a question of their being enemies , and more one of their being people you can learn from.
I don’t know how public various details are, but my impression is that this was a decent description of the EY—Dario Amodei relationship (and presumably still is?), tho I think personality clashes are also a part of that.
Also, consider the possibility that people who know how to build AIs know more than you, so it’s less a question of their being enemies , and more one of their being people you can learn from.
I mean, obviously they know more about some things and less about others? Like, virologists doing gain of function research are also people who know more than me, and I could view them as people I could learn from. Would that advance or hinder my goals?
If you are under some kind of misapprehension about the nature of their work, it would help. And you don’t know that you are not under a misapprehension, because they are the experts, not you. So you need to talk to them anyway. You might believe that you understand the field flawlessly, but you dont know until someone checks your work.
That said, I agree that this seems important to make progress on; this is one of the reasons I worked in roles, this is one of the reasons I try to be as polite as I am, this is why I’ve tried to make my presentation more adaptable instead of being more willing to write people off.
It is not enough to say nice things: other representatives must be prevented from saying nasty things.
For any statement one can make, there will be people “alienated” (=offended?) by it.
David Gerard was alienated by a race/IQ discussion and you think that should’ve been avoided.
But someone was surely equally alienated by discussions of religion, evolution, economics, education and our ability to usefully define words.
Do we value David Gerard so far above any given creationist, that we should hire a PR department to cater to him and people like him specifically?
There is an ongoing effort to avoid overtly political topics (Politics is the mind-killer!) - but this effort is doomed beyond a certain threshold, since everything is political to some extent. Or to some people.
To me, a concerted PR effort on part of all prominent representatives to never say anything “nasty” would be alienating. I don’t think a community even somewhat dedicated to “radical” honesty could abide a PR department—or vice versa.
TL;DR—LessWrong has no PR department, LessWrong needs no PR department!
For any statement one can make, there will be people “alienated” (=offended?) by it
If you also assume that nothing available except of perfection, that’s a fully general argument against PR, not just against the possibility of LW/MIRI having good PR.
If you don’t assume that, LW/MIRI can have good PR, by avoiding just the most significant bad PR. Disliking racism isn’t some weird idiosyncratic thing that only Gerard has.
It requires you to filter what you publicly and officially say. “You”, plural, the collective, can speak as freely as you like …in private. But if you, individually, want to be able to say anything you like to anyone, you had better accept the consequences.
“The mystery is why the community doesn’t implement obvious solutions. Hiring PR people is an obvious solution. There’s a posting somewhere in which Anna Salamon argues that there is some sort of moral hazard involved in professional PR, but never explains why, and everyone agrees with her anyway.”
“”You”, plural, the collective, can speak as freely as you like …in private.”
Suppose a large part of the community wants to speak as freely as it likes in public, and the mystery is solved.
We even managed to touch upon the moral hazard involved in professional PR—insofar as it is a filter between what you believe and what you say publicly.
None of these seem to reflect on EY unless you would expect him to be able to predict that a journalist would write an incoherent almost maximally inaccurate description of an event where he criticized an idea for being implausible then banned its discussion for being off-topic/pointlessly disruptive to something like two people or that his clearly written rationale for not releasing the transcripts for the ai box experiments would be interpreted as a recruiting tool for the only cult that requires no contributions to be a part of, doesn’t promise its members salvation/supernatural powers, has no formal hierarchy and is based on a central part of economics.
I would not expect EY to have predicted that himself, given his background. If, however, he either had studied PR deeply or had consulted with a domain expert before posting, then I would have totally expected that result to be predicted with some significant likelihood. Remember, optimally good rationalists should win, and be able to anticipate social dynamics. In this case EY fell into a social trap he didn’t even know existed, so again, I do not blame him personally, but that does not negate the fact that he’s historically not been very good at anticipating that sort of thing, due to lack of training/experience/intuition in that field.
I’m fairly confident that at least regarding the Roko’s Basilisk disaster, I would have been able to predict something close to what actually happened if I had seen his comment before he posted it. (This would have been primarily due to pattern matching between the post and known instances of the Striezand Effect, as well as some amount of hard-to-formally-explain intuition that EY’s wording would invoke strong negative emotions in some groups, even if he hadn’t taken any action. Studying “ratio’d” tweets can help give you a sense for this, if you want to practice that admittedly very niche skill). I’m not saying this to imply that I’m a better rationalist than EY (I’m not), merely to say that EY—and the rationalist movement generally—hasn’t focused on honing the skillset necessary to excel at PR, which has sometimes been to our collective detriment.
The question is whether people who prioritize social-position/status-based arguments over actual reality were going to contribute anything meaningful to begin with.
The rationalist community has been built on, among other things, the recognition that human species is systematically broken when it comes to epistemic rationality. Why think that someone who fails this deeply wouldn’t continue failing at epistemic rationality at every step even once they’ve already joined?
Why think that someone who fails this deeply wouldn’t continue failing at epistemic rationality at every step even once they’ve already joined?
I think making the assumption that anyone who isn’t in our community is failing to think rationally is itself not great epistemics. It’s not irrational at all to refrain from engaging with the ideas of a community you believe to be vaguely insane. After all, I suspect you haven’t looked all that deeply into the accuracy of the views of the Church of Scientology, and that’s not a failure on your part, since there’s little chance you’ll gain much of value for your time if you did. There are many, many, many groups out there who sound intelligent at first glance, but when seriously engaged with fall apart. Likewise, there are those groups which sound insane at first, but actually have deep truths to teach (I’d place some forms of Zen Buddhism under this category). It makes a lot of sense to trust your intuition on this sort of thing, if you don’t want to get sucked into cults or time-sinks.
Eliezer is extremely skilled at capturing attention. One of the best I’ve seen, outside of presidents and some VCs. However, as far as I’ve seen, he’s terrible at getting people to do what he wants. Which means that he has a tendency to attract people to a topic he thinks is important but they never do what he thinks should be done- which seems to lead to a feeling of despondence. This is where he really differs from those VCs and presidents- they’re usually far more balanced.
For an example of an absolute genius in getting people to do what he wants, see Sam Altman.
You make a strong point, and as such I’ll emend my statement a bit—Eliezer is great at PR aimed at a certain audience in a certain context, which is not universal. Outside of that audience, he is not great at Public Relations(™) in the sense of minimizing the risk of gaining a bad reputation. Historically, I am mostly referring to Eliezer’s tendency to react to what he’s believed to be infohazards in such a way that what he tried to suppress was spread vastly beyond the counterfactual world in which Eliezer hadn’t reacted at all. You only need to slip up once when it comes to risking all PR gains (just ask the countless politicians destroyed by a single video or picture), and Eliezer has slipped up multiple times in the past (not that I personally blame him; it’s a tremendously difficult skillset which I doubt he’s had the time to really work on). All of this is to say that yes, he’s great at making powerful, effective arguments, which convince many rationalist-leaning people. That is not, however, what it means to be a PR expert, and is only one small aspect of a much larger domain which rationalists have historically under-invested in.
ELK itself seems like a potentially important problem to solve, the part that didn’t make much sense to me was what they plan to do with the solution, their idea based on recursive delegation.
I will probably spend 4 days (from the 14th to the 17th, I’m somewhat busy until then) thinking about alignment to see whether there is any chance I might be able to make progress. I have read what is recommended as a starting point on the alignment forum, and can read the AGI Safety Fundamentals Course’s curriculum on my own. I will probably start by thinking about how to formalize (and compute) something similar to what we call human values, since that seems to be the core of the problem, and then turning that into something that can be evaluated over possible trajectories of the AI’s world model (or over something like reasoning chains or whatever, I don’t know). I hadn’t considered that as a career, I live in Europe and we don’t have that kind of organizations here, so it will probably just be a hobby.
Sounds like a great plan! Even if you end up deciding that you can’t make research progress (not that you should give up after just 4 days!), I can suggest a bunch of other activities that might plausibly contribute towards this.
I hadn’t considered that as a career, I live in Europe and we don’t have that kind of organizations here, so it will probably just be a hobby.
I expect that this will change within the next year or so (for example, there are plans for a Longtermist Hotel in Berlin and I think it’s very likely to happen).
• Applying to facilitate the next rounds of the AGI Safety Fundamentals course (apparently they compensated facilitators this time) • Contributing to Stampy Wiki • AI Safety Movement Building—this can be as simple as hosting dinners with two or three people who are also interested • General EA/rationalist community building • Trying to improve online outreach. Take for example the AI Safety Discussion (Open) fb group. They could probably be making better use of the sidebar. The moderator might be open to updating it if someone reached out to them and offered to put in the work. It might be worth seeing what other groups are out there too.
Let me know if none of these sound interesting and I could try to think up some more.
Same this post is what made me decide I can’t leave it to the experts. It is just a matter of spending the required time to catch up on what we know and tried. As Keltham said—Diversity is in itself an asset. If we can get enough humans to think about this problem we can get some breakthroughs many some angles others have not thought of yet.
For me, it was not demotivating. He is not a god, and it ain’t over until the fat lady sings. Things are serious and it just means we should all try our best. In fact, I am kinda happy to imagine we might see a utopia happen in my lifetime. Most humans don’t get a chance to literally save the world. It would be really sad if I died a few years before some AGI turned into a superintelligence.
I primarily upvoted it because I like the push to ‘just candidly talk about your models of stuff’:
I think we die with slightly more dignity—come closer to surviving, as we die—if we are allowed to talk about these matters plainly. Even given that people may then do unhelpful things, after being driven mad by overhearing sane conversations. I think we die with more dignity that way, than if we go down silent and frozen and never talking about our impending death for fear of being overheard by people less sane than ourselves.
I think that in the last surviving possible worlds with any significant shred of subjective probability, people survived in part because they talked about it; even if that meant other people, the story’s antagonists, might possibly hypothetically panic.
Also because I think Eliezer’s framing will be helpful for a bunch of people working on x-risk. Possibly a minority of people, but not a tiny minority. Per my reply to AI_WAIFU, I think there are lots of people who make the two specific mistakes Eliezer is warning about in this post (‘making a habit of strategically saying falsehoods’ and/or ‘making a habit of adopting optimistic assumptions on the premise that the pessimistic view says we’re screwed anyway’).
The latter, especially, is something I’ve seen in EA a lot, and I think the arguments against it here are correct (and haven’t been talked about much).
Given how long it took me to conclude whether these were Eliezer’s true thoughts or a representation of his predicted thoughts in a somewhat probable future, I’m not sure whether I’d use the label “candid” to describe the post, at least without qualification.
While the post does contain a genuinely useful way of framing near-hopeless situations and a nuanced and relatively terse lesson in practical ethics, I would describe the post as an extremely next-level play in terms of its broader purpose (and leave it at that).
I actually think Yudkowsky’s biggest problem may be that he is not talking about his models. In his most prominent posts about AGI doom, such as this and the List of Lethalities, he needs to provide a complete model that clearly and convincingly leads to doom (hopefully without the extreme rhetoric) in order to justify the extreme rhetoric. Why does attempted, but imperfect, alignment lead universally to doom in all likely AGI designs*, when we lack familiarity with the relevant mind design space, or with how long it will take to escalate a given design from AGI to ASI?
* I know his claim isn’t quite this expansive, but his rhetorical style encourages an expansive interpretation.
I’m baffled he gives so little effort to explaining his model. In List of Lethalities he spends a few paragraphs of preamble to cover some essential elements of concern (-3, −2, −1), then offers a few potentially-reasonable-but-minimally-supported assertions, before spending much of the rest of the article prattling off the various ways AGI can kill everyone. Personally I felt like he just skipped over a lot of the important topics, and so didn’t bother to read it to the end.
I think there is probably some time after the first AGI or quasi-AGI arrives, but before the most genocide-prone AGI arrives, in which alignment work can still be done. Eliezer’s rhetorical approach confusingly chooses to burn bridges with this world, as he and MIRI (and probably, by association, rationalists) will be regarded as a laughing stock when that world arrives. Various techbros including AI researchers will be saying “well, AGI came and we’re all still alive, yet there’s EY still reciting his doomer nonsense”. EY will uselessly protest “I didn’t say AGI would necessarily kill everyone right away” while the techbros retweet old EY quotes that kinda sound like that’s what he’s saying.
Edit: for whoever disagreed & downvoted: what for? You know there are e/accs on Twitter telling everyone that the idea of x-risk is based on Yudkowsky being “king of his tribe”, and surely you know that this is not how LessWrong is supposed to work. The risk isn’t supposed to be based on EY’s say-so; a complete and convincing model is needed. If, on the other hand, you disagreed that his communication is incomplete and unconvincing, it should not offend you that not everyone agrees. Like, holy shit: you think humanity will cause apocalypse because it’s not listening to EY, but how dare somebody suggest that EY needs better communication. I wrote this comment because I think it’s very important; what are you here for?
I… upvoted it because it says true and useful things about how to make the world not end and proposes an actionable strategy for how to increase our odds of survival while relatively thoroughly addressing a good number of possible objections. The goal of LessWrong is not to make people happier, and the post outlines a pretty clear hypothesis about how it might help others (1. by making people stop working on plans that condition on lots of success in a way that gets ungrounded from reality, 2. by making people not do really dumb unethical things out of desperation).
Additionally, the OP seems to me good for communication: Eliezer had a lot of bottled up thoughts, and here put them out in the world, where his thoughts can bump into other people who can in turn bump back into him.
AFAICT, conversation (free, open, “non-consequentialist” conversation, following interests and what seems worth sharing rather than solely backchaining from goals) is one of the places where consciousness and sanity sometimes enter. It’s right there next to “free individual thought” in my list of beautiful things that are worth engaging in and safeguarding.
I upvoted it because I think it’s true and I think that this is a scenario where ‘epistemic rationality’ concerns trump ‘instrumental rationality’ concerns.
Agreed with regards to “epistemic rationality” being more important at times than “instrumental rationality.” That being said, I don’t think that concerns about the latter are unfounded.
I upvoted it because I wish I could give Eliezer a hug that actually helps make things better, and no such hug exists but the upvote button is right there.
I strong-upvoted this post because I read a private draft by Eliezer which is a list of arguments why we’re doomed. The private draft is so informative that, if people around me hadn’t also read and discussed it, I would have paid several months of my life to read it. It may or may not be published eventually. This post, being a rant, is less useful, but it’s what we have for now. It’s so opaque and confusing that I’m not even sure if it’s net good, but if it’s 5% as good as the private document it still far surpasses my threshold for a strong upvote.
Can someone send me a copy so I can perv out on how doomed we are? Who knows, my natural contrarian instincts might fire and I might start to think of nitpicks and counterarguments.
But at the very least, I will enjoy it loads, and that’s something?
Upvoted because it’s important to me to know what EY thinks the mainline-probability scenario looks like and what are the implications.
If that’s what he and MIRI think is the mainline scenario, then that’s what I think is the mainline scenario, because their quality of reasoning and depth of insight seems very high whenever I have an opportunity to examine it.
Personally, I am not here (or most other places) to “enjoy myself” or “be happier”. Behind the fool’s licence of April 1, the article seems to me to be saying true and important things. If I had any ideas about how to solve the AGI problem that would pass my shoulder Eliezer test, I would be doing them all the more. However, lacking such ideas, I only cultivate my garden.
No, not at all. I have no ideas in this field, and what’s more, I incline to Eliezer’s pessimism, as seen in the recently posted dialogues, about much of what is done.
I’d still encourage you to consider projects at a meta-level up such as movement-building or earn-to-give. But also totally understand if you consider the probabilities of success too low to really bother about.
I have a weird bias towards truth regardless of consequences, and upvoted out of emotional reflex. Also I love Eliezer’s writing and it is a great comfort to me to have something fun to read on the way to the abyss.
I disagree with Eliezer about half the time, including about very fundamental things, but I strongly upvoted the post, because that attitude gives both the best chance of success conditional on the correct evaluation of the problem, and it does not kill you if the evaluation is incorrect and the x-risk in question is an error in the model. It is basically a Max EV calculation for most reasonable probability distributions.
I upvoted the post despite disagreeing with it (I believe the success probability is ~ 30%). Because, it seems important for people to openly share their beliefs in order to maximize our collective ability to converge on the truth. And, I do get some potentially valuable information from the fact that this is what Yudkowsky beliefs (even while disagreeing).
Hi, I’m always fascinated by people with success probabilities that aren’t either very low or ‘it’ll probably be fine’.
I have this collection of intuitions (no more than that):
(1) ‘Some fool is going to build a mind’,
(2) ‘That mind is either going to become a god or leave the fools in position to try again, repeat’,
(3) ‘That god will then do whatever it wants’.
It doesn’t seem terribly relevant these days, but there’s another strand that says:
(4) ‘we have no idea how to build minds that want specific things’ and
(5) ‘Even if we knew how to build a mind that wanted a specific thing, we have no idea what would be a good thing’ .
These intuitions don’t leave me much room for optimism, except in the sense that I might be hopelessly wrong and, in that case, I know nothing and I’ll default back to ‘it’ll probably be fine’.
Presumably you’re disagreeing with one of (1), (2), or (3) and one of (4) or (5).
I believe that we might solve alignment in time and aligned AI will protect us from unaligned AI. I’m not sure how to translate it to your 1-3 (the “god” will do whatever it wants, but it will want what we want so there’s no problem). In terms of 4-5, I guess I disagree with both or rather disagree that this state of ignorance will necessarily persist.
Neat, so in my terms you think we can pull off 4 and 5 and get it all solid enough to set running before anyone else does 123?
4 and 5 have always looked like the really hard bits to me, and not the sort of thing that neural networks would necessarily be good at, so good luck!
But please be careful to avoid fates-worse-than-death by getting it almost right but not quite right. I’m reasonably well reconciled with death, but I would still like to avoid doing worse if possible.
I thought it was funny. And a bit motivational. We might be doomed, but one should still carry on. If your actions have at least a slight chance to improve matters, you should do it, even if the odds are overwhelmingly against you.
Not a part of my reasoning, but I’m thinking that we might become better at tackling the issue if we have a real sense of urgency—which this and A list of lethalities provide.
My ability to take the alignment problem seriously was already hanging by a thread, and this post really sealed the deal for me. All we want is the equivalent of a BDSM submissive, a thing which actually exists and isn’t even particularly uncommon. An AI which can’t follow orders is not useful and would not be pursued seriously. (That’s why we got Instruct-GPT instead of GPT-4.) And even if one emerged, a rouge AI can’t do more damage than an intelligent Russian with an internet connection.
Apologies for the strong language, but this looks to me like a doomist cult that is going off the rails and I want no part in it.
Why would something need to be able to follow orders to be useful? Most things in the world do not follow my orders (my furniture, companies that make all my products, most people I know). Like, imagine an AI assistant that’s really good at outputting emails from your inbox that make your company more profitable. You don’t know why it says what it says, but you have learned the empirical fact that as it hires people, fires people, changes their workloads, gives them assignments, that your profits go up a lot. I can’t really tell it what to do, but it sure is useful.
I think nobody knows how to write the code of a fundamentally submissive agent and that other agents are way easier to make, ones that are just optimizing in a way that doesn’t think in terms of submission/dominance. I agree humans exist but nobody understands how they work or how to code one, and you don’t get to count on us learning that before we build super powerful AI systems.
I have no clue why you think that an intelligent Russian is the peak of optimization power. I think that’s a false and wildly anthropomorphic thing to think. Imagine getting 10 Von Neumanns in a locked room with only internet, already it’s more powerful than the Russian, and I bet could do some harm. Now imagine a million. Whatever gets you the assumption that an AI system can’t be more powerful than one human seems wild and I don’t know where you’re getting this idea from.
Btw, unusual ask, but do you want to hop on audio and hash out the debate more sometime? I can make a transcript and can link it here on LW, both posting our own one-paragraph takeaways. I think you’ve been engaging in a broadly good-faith way on the object level in this thread and others and I would be interested in returning the ball.
I think nobody knows how to write the code of a fundamentally submissive agent
Conventional non AI computers are already fundamentally passive. If you boot them up, they just sit there. What’s the problem. The word agent?
Why would something need to be able to follow orders to be useful? Most things in the world do not follow my orders (my furniture, companies that make all my products, most people I know). Like, imagine an AI assistant that’s really good at outputting emails from your inbox that make your company more profitable. You don’t know why it says what it says, but you have learned the empirical fact that as it hires people, fires people, changes their workloads, gives them assignments, that your profits go up a lot. I can’t really tell it what to do, but it sure is useful.
If an AI assistant is replacing a human assistant , it needs to be controllable to the same extent. You don’t expect or want to micromanage a human assistant, but you do expect to set broad parameters.
Sure, if it’s ‘replacing’, but my example isn’t one of replacement, it’s one where it’s useful in a different way to my other products, in a way that I personally suspect is easier to train/build than something that does ‘replacement’.
I at first also downvoted because your first argument looks incredibly weak (this post has little relation to arguing for/against the difficulty of the alignment problem, what update are you getting on that from here?), as did the followup ‘all we need is...’ which is formulation which hides problems instead of solving them. Yet, your last point does have import and that you explicitly stated that is useful in allowing everyone to address it, so I reverted to an upvote for honesty, though strong disagree.
To the point, I also want to avoid being in a doomist cult. I’m not a die hard long term “we’re doomed if don’t align AI” guy, but from my readings throughout the last year am indeed getting convinced of the urgency of the problem. Am I getting hoodwinked by a doomist cult with very persuasive rhetoric? Am I myself hoodwinking others when I talk about these problems and they too start transitioning to do alignment work?
I answer these questions not by reasoning on ‘resemblance’ (ie. how much does it look like a doomist cult) but going into finer detail. An implicit argument being made when you call [the people who endorse the top-level post] a doomist cult is that they share the properties of other doomist cults (being wrong, having bad epistemics/policy, preying on isolated/weird minds) and are thus bad. I understand having a low prior for doomist cults look-alikes actually being right (since there is no known instance of a doomist cult of world end being right), but that’s not reason to turn into a rock (as in https://astralcodexten.substack.com/p/heuristics-that-almost-always-work?s=r , believing that “no doom prophecy is ever right”. You can’t prove that no doom prophecy is ever right, only that they’re rarely right (and probably only once).
I thus advise changing your question “do [the people who endorse the top-level post] look like a doomist cult?” into “What would be sufficient level of argument and evidence so I would take this doomist-cult-looking goup seriously?”. It’s not a bad thing to call doom when doom is on the way. Engage with the object level argument and not with your precached pattern recognition “this looks like a doom cult so is bad/not serious”. Personally, I had similar qualms as you’re expressing, but having looked into the arguments, it feels very strong and much more real to believe in “Alignement is hard and by default AGI is an existential risk” rather than not. I hope your conversation with Ben will be productive and that I haven’t only expressed points you already considered (fyi they have already been discussed on LessWrong).
Seeing this post get so strongly upvoted makes me feel like I’m going crazy.
This is not the kind of content I want on LessWrong. I did not enjoy it, I do not think it will lead me to be happier or more productive toward reducing x-risk, I don’t see how it would help others, and it honestly doesn’t even seem like a particularly well done version of itself.
Can people help me understand why they upvoted?
For whatever it is worth, this post along with reading the unworkable alignment strategy on the ELK report has made me realize that we actually have no idea what to do and has finally convinced me to try to solve alignment, I encourage everyone else to do the same. For some people knowing that the world is doomed by default and that we can’t just expect the experts to save it is motivating. If that was his goal, he achieved it.
Certainly for some people (including you!), yes. For others, I expect this post to be strongly demotivating. That doesn’t mean it shouldn’t have been written (I value honestly conveying personal beliefs and are expressing diversity of opinion enough to outweigh the downsides), but we should realistically expect this post to cause psychological harm for some people, and could also potentially make interaction and PR with those who don’t share Yudkowsky’s views harder. Despite some claims to the contrary, I believe (through personal experience in PR) that expressing radical honesty is not strongly valued outside the rationalist community, and that interaction with non-rationalists can be extremely important, even to potentially world-saving levels. Yudkowsky, for all of his incredible talent, is frankly terrible at PR (at least historically), and may not be giving proper weight to its value as a world-saving tool. I’m still thinking through the details of Yudkowsky’s claims, but expect me to write a post here in the near future giving my perspective in more detail.
I don’t think “Eliezer is terrible at PR” is a very accurate representation of historical fact. It might be a good representation of something else. But it seems to me that deleting Eliezer from the timeline would probably result in a world where far far fewer people were convinced of the problem. Admittedly, such questions are difficult to judge.
I think “Eliezer is bad at PR” rings true in the sense that he belongs in the cluster of “bad at PR”; you’ll make more correct inferences about Eliezer if you cluster him that way. But on historical grounds, he seems good at PR.
Eliezer is “bad at PR” in the sense that there are lots of people who don’t like him. But that’s mostly irrelevant. The people who do like him like him enough to donate to his foundation and all of the foundations he inspired.
It’s the people who don’t like him (and are also intelligent and in positions of power), which I’m concerned with in this context. We’re dealing with problems where even a small adversarial group can do a potentially world-ending amount of harm, and that’s pretty important to be able to handle!
My personal experience is that the people who actively dislike Eliezer are specifically the people who were already set on their path; they dislike Eliezer mostly because he’s telling them to get off that path.
I could be wrong, however; my personal experience is undoubtedly very biased.
I’ll tell you that one of my brothers (who I greatly respect) has decided not to be concerned about AGI risks specifically because he views EY as being a very respected “alarmist” in the field (which is basically correct), and also views EY as giving off extremely “culty” and “obviously wrong” vibes (with Roko’s Basilisk and EY’s privacy around the AI boxing results being the main examples given), leading him to conclude that it’s simply not worth engaging with the community (and their arguments) in the first place. I wouldn’t personally engage with what I believe to be a doomsday cult (even if they claim that the risk of ignoring them is astronomically high), so I really can’t blame him.
I’m also aware of an individual who has enormous cultural influence, and was interested in rationalism, but heard from an unnamed researcher at Google that the rationalist movement is associated with the alt-right, so they didn’t bother looking further. (Yes, that’s an incorrect statement, but came from the widespread [possibly correct?] belief that Peter Theil is both alt-right and has/had close ties with many prominent rationalists.) This indicates a general lack of control of the narrative surrounding the movement, and likely has directly led to needlessly antagonistic relationships.
That’s putting it mildly.
The problems are well known. The mystery is why the community doesn’t implement obvious solutions. Hiring PR people is an obvious solution. There’s a posting somewhere in which Anna Salamon argues that there is some sort of moral hazard involved in professional PR, but never explains why, and everyone agrees with her anyway.
If the community really and literally is about saving the world, then having a constant stream of people who are put off, or even becoming enemies is incrementally making the world more likely to be destroyed. So surely it’s an important problem to solve? Yet the community doesn’t even like discussing it. It’s as if maintaining some sort of purity, or some sort of impression that you don’t make mistakes is more important than saving the world.
Presumably you mean this post.
I think there are two issues.
First, some of the ‘necessary to save the world’ things might make enemies. If it’s the case that Bob really wants there to be a giant explosion, and you think giant explosions might kill everyone, you and Bob are going to disagree about what to do, and Bob existing in the same information environment as you will constrain your ability to share your preferences and collect allies without making Bob an enemy.
Second, this isn’t an issue where we can stop thinking, and thus we need to continue doing things that help us think, even if those things have costs. In contrast, in a situation where you know what plan you need to implement, you can now drop lots of your ability to think in order to coordinate on implementing that plan. [Like, a lot of the “there are too much PR in EA” complaints were specifically about situations where people were overstating the effectiveness of particular interventions, which seemed pretty poisonous to the project of comparing interventions, which was one of the core goals of EA, rather than just ‘money moved’ or ‘number of people pledging’ or so on.]
That said, I agree that this seems important to make progress on; this is one of the reasons I worked in communications roles, this is one of the reasons I try to be as polite as I am, this is why I’ve tried to make my presentation more adaptable instead of being more willing to write people off.
So...that’s a metaphor for “telling people who like building AIs to stop building AIs pisses them off and turns them into enemies”. Which it might, but how often does that happen? Your prominent enemies aren’t in that category , as far as I can see. David Gerard,for instance, was alienated by a race/IQ discussion. So good PR might consist of banning race/IQ.
Also, consider the possibility that people who know how to build AIs know more than you, so it’s less a question of their being enemies , and more one of their being people you can learn from.
I don’t know how public various details are, but my impression is that this was a decent description of the EY—Dario Amodei relationship (and presumably still is?), tho I think personality clashes are also a part of that.
I mean, obviously they know more about some things and less about others? Like, virologists doing gain of function research are also people who know more than me, and I could view them as people I could learn from. Would that advance or hinder my goals?
If you are under some kind of misapprehension about the nature of their work, it would help. And you don’t know that you are not under a misapprehension, because they are the experts, not you. So you need to talk to them anyway. You might believe that you understand the field flawlessly, but you dont know until someone checks your work.
It is not enough to say nice things: other representatives must be prevented from saying nasty things.
For any statement one can make, there will be people “alienated” (=offended?) by it.
David Gerard was alienated by a race/IQ discussion and you think that should’ve been avoided.
But someone was surely equally alienated by discussions of religion, evolution, economics, education and our ability to usefully define words.
Do we value David Gerard so far above any given creationist, that we should hire a PR department to cater to him and people like him specifically?
There is an ongoing effort to avoid overtly political topics (Politics is the mind-killer!) - but this effort is doomed beyond a certain threshold, since everything is political to some extent. Or to some people.
To me, a concerted PR effort on part of all prominent representatives to never say anything “nasty” would be alienating. I don’t think a community even somewhat dedicated to “radical” honesty could abide a PR department—or vice versa.
TL;DR—LessWrong has no PR department, LessWrong needs no PR department!
If you also assume that nothing available except of perfection, that’s a fully general argument against PR, not just against the possibility of LW/MIRI having good PR.
If you don’t assume that, LW/MIRI can have good PR, by avoiding just the most significant bad PR. Disliking racism isn’t some weird idiosyncratic thing that only Gerard has.
The level of PR you aim for puts an upper limit to how much “radical” honesty you can have.
If you aim for perfect PR, you can have 0 honesty.
If you aim for perfect honesty, you can have no PR. lesswrong doesn’t go that far, by a long shot—even without a PR team present.
Most organization do not aim for honesty at all.
The question is where do we draw the line.
Which brings us to “Disliking racism isn’t some weird idiosyncratic thing that only Gerard has.”
From what I understand, Gerard left because he doesn’t like discussions about race/IQ.
Which is not the same thing as racism.
I, personally, don’t want lesswrong to cater to people who can not tolerate a discussion.
honesty=/=frankness. Good PR does not require you to lie.
Semantics.
Good PR requires you to put a filter between what you think is true and what you say.
It requires you to filter what you publicly and officially say. “You”, plural, the collective, can speak as freely as you like …in private. But if you, individually, want to be able to say anything you like to anyone, you had better accept the consequences.
“The mystery is why the community doesn’t implement obvious solutions. Hiring PR people is an obvious solution. There’s a posting somewhere in which Anna Salamon argues that there is some sort of moral hazard involved in professional PR, but never explains why, and everyone agrees with her anyway.”
“”You”, plural, the collective, can speak as freely as you like …in private.”
Suppose a large part of the community wants to speak as freely as it likes in public, and the mystery is solved.
We even managed to touch upon the moral hazard involved in professional PR—insofar as it is a filter between what you believe and what you say publicly.
Theres a hazard in having no filters, as well. One thing being bad doesn’t make another good.
None of these seem to reflect on EY unless you would expect him to be able to predict that a journalist would write an incoherent almost maximally inaccurate description of an event where he criticized an idea for being implausible then banned its discussion for being off-topic/pointlessly disruptive to something like two people or that his clearly written rationale for not releasing the transcripts for the ai box experiments would be interpreted as a recruiting tool for the only cult that requires no contributions to be a part of, doesn’t promise its members salvation/supernatural powers, has no formal hierarchy and is based on a central part of economics.
I would not expect EY to have predicted that himself, given his background. If, however, he either had studied PR deeply or had consulted with a domain expert before posting, then I would have totally expected that result to be predicted with some significant likelihood. Remember, optimally good rationalists should win, and be able to anticipate social dynamics. In this case EY fell into a social trap he didn’t even know existed, so again, I do not blame him personally, but that does not negate the fact that he’s historically not been very good at anticipating that sort of thing, due to lack of training/experience/intuition in that field. I’m fairly confident that at least regarding the Roko’s Basilisk disaster, I would have been able to predict something close to what actually happened if I had seen his comment before he posted it. (This would have been primarily due to pattern matching between the post and known instances of the Striezand Effect, as well as some amount of hard-to-formally-explain intuition that EY’s wording would invoke strong negative emotions in some groups, even if he hadn’t taken any action. Studying “ratio’d” tweets can help give you a sense for this, if you want to practice that admittedly very niche skill). I’m not saying this to imply that I’m a better rationalist than EY (I’m not), merely to say that EY—and the rationalist movement generally—hasn’t focused on honing the skillset necessary to excel at PR, which has sometimes been to our collective detriment.
The question is whether people who prioritize social-position/status-based arguments over actual reality were going to contribute anything meaningful to begin with.
The rationalist community has been built on, among other things, the recognition that human species is systematically broken when it comes to epistemic rationality. Why think that someone who fails this deeply wouldn’t continue failing at epistemic rationality at every step even once they’ve already joined?
I think making the assumption that anyone who isn’t in our community is failing to think rationally is itself not great epistemics. It’s not irrational at all to refrain from engaging with the ideas of a community you believe to be vaguely insane. After all, I suspect you haven’t looked all that deeply into the accuracy of the views of the Church of Scientology, and that’s not a failure on your part, since there’s little chance you’ll gain much of value for your time if you did. There are many, many, many groups out there who sound intelligent at first glance, but when seriously engaged with fall apart. Likewise, there are those groups which sound insane at first, but actually have deep truths to teach (I’d place some forms of Zen Buddhism under this category). It makes a lot of sense to trust your intuition on this sort of thing, if you don’t want to get sucked into cults or time-sinks.
I didn’t talk about “anyone who isn’t in our community,” but about
It’s epistemically irrational if I’m implying the ideas are false and if this judgment isn’t born from interacting with the ideas themselves but with
Eliezer is extremely skilled at capturing attention. One of the best I’ve seen, outside of presidents and some VCs.
However, as far as I’ve seen, he’s terrible at getting people to do what he wants.
Which means that he has a tendency to attract people to a topic he thinks is important but they never do what he thinks should be done- which seems to lead to a feeling of despondence.
This is where he really differs from those VCs and presidents- they’re usually far more balanced.
For an example of an absolute genius in getting people to do what he wants, see Sam Altman.
You make a strong point, and as such I’ll emend my statement a bit—Eliezer is great at PR aimed at a certain audience in a certain context, which is not universal. Outside of that audience, he is not great at Public Relations(™) in the sense of minimizing the risk of gaining a bad reputation. Historically, I am mostly referring to Eliezer’s tendency to react to what he’s believed to be infohazards in such a way that what he tried to suppress was spread vastly beyond the counterfactual world in which Eliezer hadn’t reacted at all. You only need to slip up once when it comes to risking all PR gains (just ask the countless politicians destroyed by a single video or picture), and Eliezer has slipped up multiple times in the past (not that I personally blame him; it’s a tremendously difficult skillset which I doubt he’s had the time to really work on). All of this is to say that yes, he’s great at making powerful, effective arguments, which convince many rationalist-leaning people. That is not, however, what it means to be a PR expert, and is only one small aspect of a much larger domain which rationalists have historically under-invested in.
Sounds about right!
I very much had the same experience, making me decide to somewhat radically re-orient my life.
What part of the ELK report are you saying felt unworkable?
ELK itself seems like a potentially important problem to solve, the part that didn’t make much sense to me was what they plan to do with the solution, their idea based on recursive delegation.
Ok, that’s a very reasonable answer.
Awesome. What are your plans?
Have you considered booking a call with AI Safety Support, registering your interest for the next AGI Safety Fundamentals Course or applying to talk to 80,000 hours?
I will probably spend 4 days (from the 14th to the 17th, I’m somewhat busy until then) thinking about alignment to see whether there is any chance I might be able to make progress. I have read what is recommended as a starting point on the alignment forum, and can read the AGI Safety Fundamentals Course’s curriculum on my own. I will probably start by thinking about how to formalize (and compute) something similar to what we call human values, since that seems to be the core of the problem, and then turning that into something that can be evaluated over possible trajectories of the AI’s world model (or over something like reasoning chains or whatever, I don’t know). I hadn’t considered that as a career, I live in Europe and we don’t have that kind of organizations here, so it will probably just be a hobby.
Sounds like a great plan! Even if you end up deciding that you can’t make research progress (not that you should give up after just 4 days!), I can suggest a bunch of other activities that might plausibly contribute towards this.
I expect that this will change within the next year or so (for example, there are plans for a Longtermist Hotel in Berlin and I think it’s very likely to happen).
What other activities?
Here’s a few off the top of my mind:
• Applying to facilitate the next rounds of the AGI Safety Fundamentals course (apparently they compensated facilitators this time)
• Contributing to Stampy Wiki
• AI Safety Movement Building—this can be as simple as hosting dinners with two or three people who are also interested
• General EA/rationalist community building
• Trying to improve online outreach. Take for example the AI Safety Discussion (Open) fb group. They could probably be making better use of the sidebar. The moderator might be open to updating it if someone reached out to them and offered to put in the work. It might be worth seeing what other groups are out there too.
Let me know if none of these sound interesting and I could try to think up some more.
Same this post is what made me decide I can’t leave it to the experts. It is just a matter of spending the required time to catch up on what we know and tried. As Keltham said—Diversity is in itself an asset. If we can get enough humans to think about this problem we can get some breakthroughs many some angles others have not thought of yet.
For me, it was not demotivating. He is not a god, and it ain’t over until the fat lady sings. Things are serious and it just means we should all try our best. In fact, I am kinda happy to imagine we might see a utopia happen in my lifetime. Most humans don’t get a chance to literally save the world. It would be really sad if I died a few years before some AGI turned into a superintelligence.
I primarily upvoted it because I like the push to ‘just candidly talk about your models of stuff’:
Also because I think Eliezer’s framing will be helpful for a bunch of people working on x-risk. Possibly a minority of people, but not a tiny minority. Per my reply to AI_WAIFU, I think there are lots of people who make the two specific mistakes Eliezer is warning about in this post (‘making a habit of strategically saying falsehoods’ and/or ‘making a habit of adopting optimistic assumptions on the premise that the pessimistic view says we’re screwed anyway’).
The latter, especially, is something I’ve seen in EA a lot, and I think the arguments against it here are correct (and haven’t been talked about much).
Given how long it took me to conclude whether these were Eliezer’s true thoughts or a representation of his predicted thoughts in a somewhat probable future, I’m not sure whether I’d use the label “candid” to describe the post, at least without qualification.
While the post does contain a genuinely useful way of framing near-hopeless situations and a nuanced and relatively terse lesson in practical ethics, I would describe the post as an extremely next-level play in terms of its broader purpose (and leave it at that).
I actually think Yudkowsky’s biggest problem may be that he is not talking about his models. In his most prominent posts about AGI doom, such as this and the List of Lethalities, he needs to provide a complete model that clearly and convincingly leads to doom (hopefully without the extreme rhetoric) in order to justify the extreme rhetoric. Why does attempted, but imperfect, alignment lead universally to doom in all likely AGI designs*, when we lack familiarity with the relevant mind design space, or with how long it will take to escalate a given design from AGI to ASI?
* I know his claim isn’t quite this expansive, but his rhetorical style encourages an expansive interpretation.
I’m baffled he gives so little effort to explaining his model. In List of Lethalities he spends a few paragraphs of preamble to cover some essential elements of concern (-3, −2, −1), then offers a few potentially-reasonable-but-minimally-supported assertions, before spending much of the rest of the article prattling off the various ways AGI can kill everyone. Personally I felt like he just skipped over a lot of the important topics, and so didn’t bother to read it to the end.
I think there is probably some time after the first AGI or quasi-AGI arrives, but before the most genocide-prone AGI arrives, in which alignment work can still be done. Eliezer’s rhetorical approach confusingly chooses to burn bridges with this world, as he and MIRI (and probably, by association, rationalists) will be regarded as a laughing stock when that world arrives. Various techbros including AI researchers will be saying “well, AGI came and we’re all still alive, yet there’s EY still reciting his doomer nonsense”. EY will uselessly protest “I didn’t say AGI would necessarily kill everyone right away” while the techbros retweet old EY quotes that kinda sound like that’s what he’s saying.
Edit: for whoever disagreed & downvoted: what for? You know there are e/accs on Twitter telling everyone that the idea of x-risk is based on Yudkowsky being “king of his tribe”, and surely you know that this is not how LessWrong is supposed to work. The risk isn’t supposed to be based on EY’s say-so; a complete and convincing model is needed. If, on the other hand, you disagreed that his communication is incomplete and unconvincing, it should not offend you that not everyone agrees. Like, holy shit: you think humanity will cause apocalypse because it’s not listening to EY, but how dare somebody suggest that EY needs better communication. I wrote this comment because I think it’s very important; what are you here for?
I… upvoted it because it says true and useful things about how to make the world not end and proposes an actionable strategy for how to increase our odds of survival while relatively thoroughly addressing a good number of possible objections. The goal of LessWrong is not to make people happier, and the post outlines a pretty clear hypothesis about how it might help others (1. by making people stop working on plans that condition on lots of success in a way that gets ungrounded from reality, 2. by making people not do really dumb unethical things out of desperation).
Ditto.
Additionally, the OP seems to me good for communication: Eliezer had a lot of bottled up thoughts, and here put them out in the world, where his thoughts can bump into other people who can in turn bump back into him.
AFAICT, conversation (free, open, “non-consequentialist” conversation, following interests and what seems worth sharing rather than solely backchaining from goals) is one of the places where consciousness and sanity sometimes enter. It’s right there next to “free individual thought” in my list of beautiful things that are worth engaging in and safeguarding.
I upvoted it because I think it’s true and I think that this is a scenario where ‘epistemic rationality’ concerns trump ‘instrumental rationality’ concerns.
Agreed with regards to “epistemic rationality” being more important at times than “instrumental rationality.” That being said, I don’t think that concerns about the latter are unfounded.
I upvoted it because I wish I could give Eliezer a hug that actually helps make things better, and no such hug exists but the upvote button is right there.
I strong-upvoted this post because I read a private draft by Eliezer which is a list of arguments why we’re doomed. The private draft is so informative that, if people around me hadn’t also read and discussed it, I would have paid several months of my life to read it. It may or may not be published eventually. This post, being a rant, is less useful, but it’s what we have for now. It’s so opaque and confusing that I’m not even sure if it’s net good, but if it’s 5% as good as the private document it still far surpasses my threshold for a strong upvote.
EDIT: it may or may not be published eventually
I assume that this post (List of Lethalities) is the public version of what become of that doc
Indeed
Oooh, that sounds great!
Can someone send me a copy so I can perv out on how doomed we are? Who knows, my natural contrarian instincts might fire and I might start to think of nitpicks and counterarguments.
But at the very least, I will enjoy it loads, and that’s something?
Yeah send me a copy too simfish@gmail.com
If you’re still offering to share, i would like to read it faunam@gmail.com
Upvoted because it’s important to me to know what EY thinks the mainline-probability scenario looks like and what are the implications.
If that’s what he and MIRI think is the mainline scenario, then that’s what I think is the mainline scenario, because their quality of reasoning and depth of insight seems very high whenever I have an opportunity to examine it.
Personally, I am not here (or most other places) to “enjoy myself” or “be happier”. Behind the fool’s licence of April 1, the article seems to me to be saying true and important things. If I had any ideas about how to solve the AGI problem that would pass my shoulder Eliezer test, I would be doing them all the more. However, lacking such ideas, I only cultivate my garden.
Have you considered registering for the next round of the AGI Safety Fundamentals course, booking a call with AI Safety Support or talking to 80,000 Hours?
No, not at all. I have no ideas in this field, and what’s more, I incline to Eliezer’s pessimism, as seen in the recently posted dialogues, about much of what is done.
I’d still encourage you to consider projects at a meta-level up such as movement-building or earn-to-give. But also totally understand if you consider the probabilities of success too low to really bother about.
I have a weird bias towards truth regardless of consequences, and upvoted out of emotional reflex. Also I love Eliezer’s writing and it is a great comfort to me to have something fun to read on the way to the abyss.
I disagree with Eliezer about half the time, including about very fundamental things, but I strongly upvoted the post, because that attitude gives both the best chance of success conditional on the correct evaluation of the problem, and it does not kill you if the evaluation is incorrect and the x-risk in question is an error in the model. It is basically a Max EV calculation for most reasonable probability distributions.
I upvoted the post despite disagreeing with it (I believe the success probability is ~ 30%). Because, it seems important for people to openly share their beliefs in order to maximize our collective ability to converge on the truth. And, I do get some potentially valuable information from the fact that this is what Yudkowsky beliefs (even while disagreeing).
Hi, I’m always fascinated by people with success probabilities that aren’t either very low or ‘it’ll probably be fine’.
I have this collection of intuitions (no more than that):
(1) ‘Some fool is going to build a mind’,
(2) ‘That mind is either going to become a god or leave the fools in position to try again, repeat’,
(3) ‘That god will then do whatever it wants’.
It doesn’t seem terribly relevant these days, but there’s another strand that says:
(4) ‘we have no idea how to build minds that want specific things’ and
(5) ‘Even if we knew how to build a mind that wanted a specific thing, we have no idea what would be a good thing’ .
These intuitions don’t leave me much room for optimism, except in the sense that I might be hopelessly wrong and, in that case, I know nothing and I’ll default back to ‘it’ll probably be fine’.
Presumably you’re disagreeing with one of (1), (2), or (3) and one of (4) or (5).
Which ones and where does the 30% from?
I believe that we might solve alignment in time and aligned AI will protect us from unaligned AI. I’m not sure how to translate it to your 1-3 (the “god” will do whatever it wants, but it will want what we want so there’s no problem). In terms of 4-5, I guess I disagree with both or rather disagree that this state of ignorance will necessarily persist.
Neat, so in my terms you think we can pull off 4 and 5 and get it all solid enough to set running before anyone else does 123?
4 and 5 have always looked like the really hard bits to me, and not the sort of thing that neural networks would necessarily be good at, so good luck!
But please be careful to avoid fates-worse-than-death by getting it almost right but not quite right. I’m reasonably well reconciled with death, but I would still like to avoid doing worse if possible.
My initial reaction to the post was almost as negative as yours.
I’ve partly changed my mind, due to this steelman of Eliezer’s key point by Connor Leahy.
I thought it was funny. And a bit motivational. We might be doomed, but one should still carry on. If your actions have at least a slight chance to improve matters, you should do it, even if the odds are overwhelmingly against you.
Not a part of my reasoning, but I’m thinking that we might become better at tackling the issue if we have a real sense of urgency—which this and A list of lethalities provide.
My ability to take the alignment problem seriously was already hanging by a thread, and this post really sealed the deal for me. All we want is the equivalent of a BDSM submissive, a thing which actually exists and isn’t even particularly uncommon. An AI which can’t follow orders is not useful and would not be pursued seriously. (That’s why we got Instruct-GPT instead of GPT-4.) And even if one emerged, a rouge AI can’t do more damage than an intelligent Russian with an internet connection.
Apologies for the strong language, but this looks to me like a doomist cult that is going off the rails and I want no part in it.
I disagree with each of your statements.
Why would something need to be able to follow orders to be useful? Most things in the world do not follow my orders (my furniture, companies that make all my products, most people I know). Like, imagine an AI assistant that’s really good at outputting emails from your inbox that make your company more profitable. You don’t know why it says what it says, but you have learned the empirical fact that as it hires people, fires people, changes their workloads, gives them assignments, that your profits go up a lot. I can’t really tell it what to do, but it sure is useful.
I think nobody knows how to write the code of a fundamentally submissive agent and that other agents are way easier to make, ones that are just optimizing in a way that doesn’t think in terms of submission/dominance. I agree humans exist but nobody understands how they work or how to code one, and you don’t get to count on us learning that before we build super powerful AI systems.
I have no clue why you think that an intelligent Russian is the peak of optimization power. I think that’s a false and wildly anthropomorphic thing to think. Imagine getting 10 Von Neumanns in a locked room with only internet, already it’s more powerful than the Russian, and I bet could do some harm. Now imagine a million. Whatever gets you the assumption that an AI system can’t be more powerful than one human seems wild and I don’t know where you’re getting this idea from.
Btw, unusual ask, but do you want to hop on audio and hash out the debate more sometime? I can make a transcript and can link it here on LW, both posting our own one-paragraph takeaways. I think you’ve been engaging in a broadly good-faith way on the object level in this thread and others and I would be interested in returning the ball.
Sure. The best way for me to do that would be through Discord. My id is lone-pine#4172
Would you mind linking the transcript here if you decide to release it publicly? I’d love to hear both of your thoughts expressed in greater detail!
That’d be the plan.
(Ping to reply on Discord.)
Sent you a friend request.
Ooh, I like Ben’s response and am excited about the audio thing happening.
Conventional non AI computers are already fundamentally passive. If you boot them up, they just sit there. What’s the problem. The word agent?
If an AI assistant is replacing a human assistant , it needs to be controllable to the same extent. You don’t expect or want to micromanage a human assistant, but you do expect to set broad parameters.
Yes, the word agent.
Sure, if it’s ‘replacing’, but my example isn’t one of replacement, it’s one where it’s useful in a different way to my other products, in a way that I personally suspect is easier to train/build than something that does ‘replacement’.
I at first also downvoted because your first argument looks incredibly weak (this post has little relation to arguing for/against the difficulty of the alignment problem, what update are you getting on that from here?), as did the followup ‘all we need is...’ which is formulation which hides problems instead of solving them.
Yet, your last point does have import and that you explicitly stated that is useful in allowing everyone to address it, so I reverted to an upvote for honesty, though strong disagree.
To the point, I also want to avoid being in a doomist cult. I’m not a die hard long term “we’re doomed if don’t align AI” guy, but from my readings throughout the last year am indeed getting convinced of the urgency of the problem. Am I getting hoodwinked by a doomist cult with very persuasive rhetoric? Am I myself hoodwinking others when I talk about these problems and they too start transitioning to do alignment work?
I answer these questions not by reasoning on ‘resemblance’ (ie. how much does it look like a doomist cult) but going into finer detail. An implicit argument being made when you call [the people who endorse the top-level post] a doomist cult is that they share the properties of other doomist cults (being wrong, having bad epistemics/policy, preying on isolated/weird minds) and are thus bad. I understand having a low prior for doomist cults look-alikes actually being right (since there is no known instance of a doomist cult of world end being right), but that’s not reason to turn into a rock (as in https://astralcodexten.substack.com/p/heuristics-that-almost-always-work?s=r , believing that “no doom prophecy is ever right”. You can’t prove that no doom prophecy is ever right, only that they’re rarely right (and probably only once).
I thus advise changing your question “do [the people who endorse the top-level post] look like a doomist cult?” into “What would be sufficient level of argument and evidence so I would take this doomist-cult-looking goup seriously?”. It’s not a bad thing to call doom when doom is on the way. Engage with the object level argument and not with your precached pattern recognition “this looks like a doom cult so is bad/not serious”. Personally, I had similar qualms as you’re expressing, but having looked into the arguments, it feels very strong and much more real to believe in “Alignement is hard and by default AGI is an existential risk” rather than not. I hope your conversation with Ben will be productive and that I haven’t only expressed points you already considered (fyi they have already been discussed on LessWrong).