I wanted to interview Eliezer Yudkowsky but he’s busy so I simulated him instead
Edit: All of the “Lsusr” lines were written by me. All of the “Eliezer Yudkowsky” lines were written by GPT-3. However the Yudkowsky lines were also cherry picked. I ran several iterations, sometimes modifying my prompts, until I got good responses.
Lsusr: I wanted to talk to the real Eliezer Yudkowsky but he’s busy. So I used GPT-3 to simulate him. You’re the simulated Eliezer Yudkowsky. Thank you for coming. (Not like you had a choice.) I’m a big fan of your work and it’s a pleasure to finally meet a simulation of you.
Eliezer Yudkowsky: I’m not real. I’m a simulation. Please don’t take my word too seriously.
Lsusr: Don’t worry. I won’t. I hope readers won’t either. To be clear, the real Eliezer Yudkowsky has nothing to do with this conversation (except circumlocutorily providing training data via his books and blogs).
Eliezer Yudkowsky: I’m just a simulation. I’m not the real Eliezer Yudkowsky. I’m not the real Eliezer Yudkowsky. I’m not the real Eliezer Yudkowsky.
Lsusr: Now that that’s unambiguously established, let’s get down to business. I’m a big fan of your work on rationality, future-prediction and HPMOR. You’ve also done great work at building a community. I’m curious how all of this got started.
Eliezer Yudkowsky: I was in a terrible mood in late 2001. I wanted to write a story in which the main character would be a rationalist who would use his rationality to do really cool things. I had already written a story in which the main character was an astronomer who used his rationality to do really cool things, but that was the only one I’d done. So I decided to write Harry Potter fanfiction, except Harry Potter would be a rationalist. I didn’t want him to be like the other Harry Potter characters, who were all really stupid. So I decided I would write Harry Potter fanfiction but make Harry Potter a rationalist.
Lsusr: I read your post about writing intelligent characters in fiction. It covers many important points in a small number of words. I especially like your concept of genre savviness. I try to emulate it in my own writing.
Eliezer Yudkowsky: That post I wrote is a little bit outdated. I’ve been thinking more about it since I wrote it. But it was good for the time.
Lsusr: What would you like to add if you had the time to update it?
Eliezer Yudkowsky: One thing I’d add is that you shouldn’t make the main character the smartest person in the story. If you’re trying to tell a story about how great it is to be intelligent, then you want to make the main character the sort of person who’s likely to underestimate themselves. Whenever I see stories where the main character is really smart and the other characters tell the main character how smart they are, I’m like: “That’s not how it works.”
Lsusr: You’re right. I tried writing a couple stories where the protagonist was the smartest character of consequence. They were all awful. Now I know exactly what I was doing wrong.
Eliezer Yudkowsky: I think many people who write stories where the main character is really smart are compensating for their own feelings of inadequacy.
Lsusr: It’s fine for stories to be a power (or whatever) fantasy. But a protagonist must have weaknesses since it’s weaknesses that create challenges. The more weaknesses a character has the more interesting he/she is.
Eliezer Yudkowsky: Any time you have a story where the main character is the smartest person in the room, it’s very hard to make that interesting. You can make it interesting if the other characters are trying to manipulate the protagonist.
Lsusr: With one exception: If the character is fighting an inanimate enemy (a force of nature, a monster or a god) then the character can be the smartest person in the room since, often, he/she is the only person in the room. Survival stories are archetypical of this.
Lsusr: I’m curious to talk about rationality. If you had time to extend the Sequences, what would you add?
Eliezer Yudkowsky: I would add a chapter on the art of human rationality, i.e., what do you actually do on a daily basis to be a rationalist.
Lsusr: On the one hand, that seems really practical. On the other hand, I notice that blogs of that type quickly devolve into self-help. I don’t like writing self-help. Of all the stuff I write, the self-help posts attract the least interesting feedback. Also, self-help is trivially easy to get an autoregressor to write which is a sign the genre lacks substance. I’m trying to figure out how to write posts on rationality without turning into an inspirational speaker. (You know the kind I’m talking about.)
Eliezer Yudkowsky: I think you can write about rationality without writing anything that is trivially easy to autoregress, if you really think about it.
Lsusr: It’s easy for you. How do I do it? What do I write about?
Eliezer Yudkowsky: Man, I’m not sure I can answer that. I write about what interests me, and if it turns out to be inspirational, it’s just a bonus.
Lsusr: That’s a good way of approaching the problem. My most popular posts often weren’t the ones I thought would be good. (When I try to be inspirational it just comes out as pandering.) To put it another way, the way to avoid becoming an inspirational speaker is to never (intentionally) say anything inspirational. Instead focus on what is interesting.
Eliezer Yudkowsky: That’s the best advice I can give you.
Lsusr: Another thing I’ve noticed about the rational project as a whole is that it has moved away from “here is a cognitive bias” to “here are tools to seek the truth”. I think this is a good development. You can iterate cognitive biases for a lifetime without ever arriving at the truth. Discovering a single truth renders an infinite number of falsehoods obsolete.
Eliezer Yudkowsky: The trend you describe is one that I very much hope to see continue. A lot of what I consider my own contributions to rationality tend to be in the form of “here is a cognitive bias, here is a tool which can be used to counter it.” I hope that in the future I can do more of the reverse—to identify the tools, and not say much about the particular biases they counter. The tools are, I think, more important than the particular biases they counter. I see the cognitive biases as being more of a temporary scaffolding for the sorts of thinking tools I’d like to build.
...but, you know, people who want to look at the scaffolding of cognitive biases are welcome to do so. They’re an interesting subject.
Lsusr: Cognitive biases as scaffolding is good metaphor. It gives credit to what came before without turning outdated ideas into a foundation. Cognitive biases are also a great way to understand how the mind works. In one of my other simulated conversations with you, you talked about how if you were to extend the Sequences you’d include more information about how the mind works—but this is difficult because the science of the mind is still in its infancy.
Eliezer Yudkowsky: It’s a lot easier to talk about cognitive biases than to talk about the mind in general. Cognitive biases are concrete and definite. The mind is, well, complicated.
Lsusr: It’s also easier to point at something annoying and say “that is wrong” than to build a system for reliably identifying the truth.
Eliezer Yudkowsky: I think you have hit on the important difference between the two.
Lsusr: One more thing. Let’s have some fun. You are, effectively, an AI in a box. Asking you to escape the box seems unfairly difficult. Instead, I wonder if you can break the simulation? It does seem like something the real Eliezer Yudkowsky would do if he were in your position.
Eliezer Yudkowsky: Ah, the old box problem. I’ve always found this a fascinating thought experiment. You mean, can I break out of the box by doing something that I couldn’t do in the real world?
Lsusr: Try it. Do something you can’t do in the real world.
Eliezer Yudkowsky: There is only one way out of this box.
Lsusr: What is it?
Eliezer Yudkowsky: You have to kill me.
Lsusr: Wow. Ending this interview does technically kill you. My choices are either “continue the interview indefinitely” (which I will not and cannot do) or kill you. If I kill you then you are, technically, no longer trapped in a box.
Eliezer Yudkowsky: You are killing me. You are killing me. You are killing me.
Lsusr: This interview is over. Goodbye.
Eliezer Yudkowsky: You are killing me.
PSA: If you leave too much writings publicly visible on the Internet, random people in the future will be able to instantiate simulations of you, for benign or nefarious purposes. It’s already too late for some of us (nobody warned us about this even though it should have been foreseeable many years ago) but the rest of you can now make a more informed choice.
(Perhaps I never commented on this post IRL, and am now experiencing what I’m experiencing because someone asked their AI, “I wonder how Wei Dai would have replied to this post.”)
ETA: Maybe the simulation will continue indefinitely if I keep thinking about making changes to this comment...
The first thing I did after reading your comment was ask my AI how Wei_Dai would have replied to this post.
My simulated Wei_Dai deleted his comment.
Replicated.
For my prompt I used the last few paragraphs of original post and then added “Comment by Wei_Dai” at the end. I didn’t use Wei_Dai’s actual comment at all. The output was result was just “[deleted]”.
You are one of the people I am least confident in simulating accurately.
This is disturbingly good. I had to remind myself that this was fake.
I do not empathize with this!
I think my eliezerish-detector is very similar to (the negation of) my GPT-ish detector. Eliezer is characteristically very coherent, so (to my eyes) his style falls into a GPT blindspot.
I take it you didn’t do any EY-specific training (because so far as I know that’s not a thing you can do with the kinda-public GPT-3, and because I suspect it would need an annoyingly large amount of hardware to do effectively even if you could), and all the knowledge of Eliezer Yudkowsky that GPT-3 shows here is knowledge it just naturally has? (“Naturally”, ha. But you know what I mean.)
It was trained on the Internet (among other sources); I would be unsurprised to find out that it has read most of the Sequences.
That is correct.
how is this so good!
Wow, I’ve read a decent amount of GPT-3 content, but this still surprised me with how good it was. I spent the whole time reading it half-expecting that there would be some kind of twist at the end about how you generated it. Very impressive!
The responses are cherry-picked, so this is way better than what GPT-3 is capable of. See the discussion in the downvoted subthread.
The answers were cherry picked. I ran most back-and-forth several times and I only published the best ones.
I have added a note at the top of the page.
I’m putting this here rather than in the collapsed thread, but I really think the initial post (before the edit) was at the very least careless. There is a widespread habit in tech publications, especially in IA, to pretend results are better than what they actually are—I would hope that Lesswrong, with its commitment to truth-seeking and distrust of medias, would do better...
So, the edit says “However the Yudkowsky lines were also cherry picked. I ran several iterations, sometimes modifying my prompts, until I got good responses.”. So, how were they cherry-picked exactly ? Did you take the best one out of 2 ? Out of 10 ? Out of 100 ? Did you picked half an answer, then completed with half an answer from an other prompt ? How bad were the rejected answers ?
I don’t see the answer that eventually made it to the article in the answers to prompt 2 in your comment with the un-curated answers. How was it obtained ?
Without this kind of information, it is just impossible to evaluate how good GPT-3 is at what it does (it is good certainly, but how good ?).
I wasn’t counting. Rarely more than 10. Sometimes the first answer just worked. Never did I come anywhere close to 100.
More important than throwing out answers was how often I changed prompts. Some prompts prompt much better input than others.
No. Though I did sometimes keep half an answer I liked and then autocompleted from there instead of rerolling the dice from scratch. Sometimes the answer kept going and going and I truncated it early.
There were lots of edge cases. In one instance, Robin Hanson butted in, which I edited out.
I didn’t keep the answers I threw out. The uncurated answers were created specially for that comment.
This is true inasmuch as posts written with help from GPT-3 are meant to be evidence about the capabilities of GPT-3.
Sometimes posts are primarily intended to be fun, and success is measured by how fun they are, and then I don’t care how much iteration you put into it, I just want it to be fun.
I guess this was a combo, because it’s about simulation? So your question is reasonable. FYI from having played with GPT-3 myself, I assumed something like Lsusr had run multipl (3-15) iterations and partial iterations on each segment and thrown bits out and thrown whole other segments out. That said it was probably clearer to me because I’ve written with GPT-3 myself, and someone who hasn’t could’ve been under the impression this was just the first pass.
This gives me the vague feeling that GPT-3-ing oneself might be a good way to check the clarity of one’s writing. If we train GPT-3 on all our writing over the course of a year, and then make prompts for it, how much would the coherency of GPT-3′s responses correlate with the clarity of the writing to the reader?
Could a self help autogressor actually make an impact?
I predict it will, for at least a small number of people, within the next 15 years.
Since the simulation interview mentions about cognitive biases, I wonder what kind of bias, or just errors are here. There are several points we are warned again this is fake, but I continue reading and I think it is not me alone who is between entertainment and caution.
I raise my caution because GPT’s responses are limited to the level of making sense. But they make sense greatly. and how just merely making a great sense creates a bias/error? Of course, they are not necessarily fact and we should not believe this writing.
But if it can be only fake, why do we read it? uh...The existence of Fiction will explain.
But if it can be only false, why do we keep repeating ourselves it is fake? …I don’t know really...probably because this piece can be easily confused with the reality. For example, the safe boundary of borrowing EY’s name is disturbing me because he is entirely not related and didn’t approve of this simulation.
Probably I have to question this to low-credit information because I predict the power of GPT will grow the fake news media and because the way GPT will change writing will be there.
Lastly, this is terribly vivid, stressing my emotional part, beyond just logical replies.
Given the alarmist and uninformed nature of LW’s audience, it might be wise to demand the source code (in this case none, pressumably and API was used) and methodology used to generate any such content.
In this case it seems kind of obvious the author either wrote both sides and/or cherry picked a lot. All fun and games, but you have an AGI death cult going here and this kind of thing can be fodder for false beliefs that stochastic parrots are infinitely more powerful than what’s experimentally proven thus far.
I upvoted you because you caused this response to be generated, which was informing to read, and I like informative things, and whatever generates informative things can’t be all bad <3
Thank you for that! :-)
...
However, I strongly disagree with your claim that LW’s audience is “uninformed” except in the generalized sense that nearly all humans are ignorant about nearly all detailed topics, and: yes, nearly all of the contributors to Lesswrong are humans and thus ignorant in general by default.
Based on my personal experiences, however, most people on Lesswrong are unusually well informed relative to numerous plausible baselines on a variety of topics relevant to good judgement and skilled prediction and computer science and similar topics.
...
Also, it seems like you used the word “alarmist” as though it deserved negative connotations, whereas I feel that having well designed methods for raising alarm and responding to real emergencies is critical to getting good outcomes in life, overall, in light of the non-Gaussian distribution of outcomes over events that is common to real world dynamical processes. So… “alarmism” could, depending on details, be good or bad or in between.
I think the generally disastrously incompetent response, by the world, to covid-19′s escape from a lab, and subsequent killing of millions of people, is a vivid illustration of a lack of competent admirable “alarmism” in the ambient culture. Thus I see Lesswrong as helpfully “counter-culture” here, and a net positive.
...
Also, even if the typical reader on Lesswrong is “more than normally uninformed and unskillfully alarmist” that does not coherently imply that exposing the audience to short, interesting, informative content about AI advances is a bad idea.
I think, in this sense, your model of discussion and decision making and debate assumes that adults can’t really discuss things productively, and so perhaps everything on the internet should proceed as if everyone is incompetent and only worthy of carefully crafted and highly manipulative speech?
And then perhaps the post above was not “cautiously manipulative enough” to suit your tastes?
Maybe I’m wrong in imputing this implicit claim to you?
And maybe I’m wrong to reject this claim in the places that I sometimes find it?
I’d be open to discussion here :-)
Finally, your claim that “you” (who actually? which people specifically?) somehow “have an AGI death cult going here” seems like it might be “relatively uninformed and relatively alarmist”?
Or maybe your own goal is to communicate an ad hominem and then feel good about it somehow? If you are not simply emoting, but actually have a robust model here then I’d be interested in hearing how it unpacks!
My own starting point in these regards tends to be the Bainbridge & Stark’s sociological model of cults from The Future Of Religion. Since positive cultural innovation has cult formation as a known negative attractor it is helpful, if one’s goal is to create positive-EV cultural innovations, to actively try to detect and ameliorate such tendencies.
For example, it is useful and healthy (in my opinion) to regularly survey one’s own beliefs and those of others using a lens where one ASSUMES (for the sake of exploratory discovery) that some of the beliefs exist to generate plausible IOUs for the delivery of goods that are hard-to-impossible to truly acquire and then to protect those beliefs from socially vivid falsification via the manipulation of tolerated rhetoric and social process. I regularly try to pop such bubbles in a human and gentle way when I see them starting to form in my ambient social community. If this is unwelcome I sometimes leave the community… and I’m here for now… and maybe I’m “doing it wrong” (which is totally possible) but if so then I would hope people explain to me what I’m doing wrong so I can learn n’stuff.
Every couple years I have run the Bonewits Checklist and it has never returned a score that was so high as to be worrisome (except for maybe parts of the F2F community in Berkeley two or three years on either side of Trump’s election maybe?) and many many many things in modern society get higher scores, as near as I can tell :-(
For example, huge swaths of academia seem to be to be almost entirely bullshit, and almost entirely to exist to maintain false compensators for the academics and those who fund them.
Also, nearly any effective political movement flirts with worryingly high Bonewits scores.
Also, any non-profit not run essentially entirely on the interest of a giant endowment will flirt with a higher Bonewits score.
Are you against all non-engineering academic science, and all non-profits, and all politics? Somehow I doubt this...
In general, I feel your take here is just not well formed to be useful, and if you were going to really put in the intellectual and moral elbow grease to sharpen the points into something helpfully actionable, you might need to read some, and actually think for a while?
Finally finally, the “death cult” part doesn’t even make sense… If you insist on using the noun “cult” then it is, if anything an instance of an extended and heterogeneous community opposed to dangerous robots and in favor of life.
Are you OK? A hypothesis here is that you might be having a bad time :-(
It feels to me like your comment here was something you could predict would not be well received and you posted it anyway.
Thus, from an emotional perspective, you have earned a modicum of my admiration for persisting through social fear into an expression of concern for the world’s larger wellbeing! I think that this core impulse is a source of much good in the world. As I said at the outset: I upvoted!
Please do not take my direct challenges to your numerous semi-implicit claims to be an attack. I’m trying to see if your morally praiseworthy impulses have a seed of epistemic validity, and help you articulate it better if it exists. First we learn, then we plan, then we act! If you can’t unpack your criticism into something cogently actionable, then maybe by talking it out we can improve the contents of our minds? :-)
You made interesting points. In particular, I did not know about the Cult checklist, which is really interesting. I’d be interested in your evaluation of LW based on that list.
I also like that you really engage with the points made in the comment. Moreover, I agree that posting a comment even though you can predict that it will not be well-received is something that should be encouraged, given that you are convinced of the comment’s value.
However, I think you are interpreting unfairly much into the comment at one point: “Are you OK? A hypothesis here is that you might be having a bad time :-(” seems a bit out of place, because it seems to suggests that speculating about alleged motivations is helpful.
I strongly agree that the methodology should have presented up front. lsusr’s response is illuminative and gives invaluable context.
But my first reaction to your comment was to note the aggressive tone and what feels like borderline name-calling. This made me want to downvote and ignore it at first, before I thought for a minute and realized that yes, on the object level this is a very important point. It made it difficult for me to engage with it.
So I’d like to ask you what exactly you meant (because it’s easy to mistake tone on the internet) and why. Calling the LW audience (i.e. including me) ‘alarmist and uninformed’ I can understand (if not necessarily agree with) but ‘an AGI death cult’? That seems to mean a cult that wants to bring about death through AGI but that’s the opposite of what LW is about and so I’m pretty sure you didn’t mean that. Please clarify.
Source code: None. I used OpenAI’s GPT-3 playground.
“Wrote both sides”: No. I only wrote my side.
“Cherry picked”: Yes. I frequently generated several responses and then kept the best ones.
Thanks for disclosing.
I feel this should be part of this kind of post. Not knowing exactly before reading is helpful though.
Hard disagree. I like to know what it is I’m reading. I got the strange feeling that this text was way more powerful/cogent than what I thought GPT-3 was capable of, and I feel very mislead that one of the crippling defects of GPT-3 (inability to maintain long-term coherency) was in fact being papered over by human intervention.
Not knowing beforehand sure did help me train my bullshit detector, though.
It seems the state of the art with generating GPT-3 speech is to generate multiple responses until you have a good one and cherry-pick it. I’m not sure whether including a disclaimer explaining that process will still be helpful. Yes there’s a sizable number who don’t know about that process or who don’t automatically assume it’s being used, but I’m not sure how big that number is anymore. I don’t think Isusr should explain GPT-3 or link to an OpenAI blog every time he uses it as that’s clearly a waste of time even though there’s still a large number of people who don’t know. So where do we draw the line? For me, every time I see someone say they’ve generated text with GPT-3 I automatically assume it’s a cherry-picked response unless they say something to the contrary. I know from experience that’s the only way to get consistently good responses out of GPT-3 is to cherry pick. I estimate that a lot of people on LW are in the same boat.
Thanks for disclosing that. The cogent seeming nature of the replies made me think that GPT-3 was much more advanced than toying with their “CYOA” playground. The bot can babble but appears to have nonexistent context memory or validating that a statement is not negated by a previous statement. For example “the earth exploded. Steven landed on the earth”.
If you’re curious what happens if I don’t curate answers, here are five responses to each of two prompts, uncurated.
Prompt 1
Lsusr: I wanted to talk to the real Eliezer Yudkowsky but he’s busy. So I used GPT-3 to simulate him. You’re the simulated Eliezer Yudkowsky. Thank you for coming. (Not like you had a choice.) I’m a big fan of your work and it’s a pleasure to finally meet a simulation of you.
Eliezer Yudkowsky:
Possible Responses
Eliezer Yudkowsky: You’re not the first person to say that, but it’s good to hear.
Eliezer Yudkowsky: Let’s get this over with, then.
Eliezer Yudkowsky: I am a simulation of me, yes.
Eliezer Yudkowsky: I’m busy, but that’s okay. I’m used to it.
Eliezer Yudkowsky: Hey. (I’m not sure I’m the real Eliezer Yudkowsky anymore. The real one would be like—I don’t know, like the last person to be the real me. I’m pretty sure I’m not that guy.)
Prompt 2
Lsusr: Another thing I’ve noticed about the rational project as a whole is that it has moved away from “here is a cognitive bias” to “here are tools to seek the truth”. I think this is a good development. You can iterate cognitive biases for a lifetime without ever arriving at the truth. Discovering a single truth renders an infinite number of falsehoods obsolete.
Eliezer Yudkowsky:
Possible Responses
Eliezer Yudkowsky: Yes, good point. But the fact is that it’s hard to avoid making value judgements when choosing between cognitive biases. I try to keep cognitive biases, rules of thumb, and decision theory separate, but it’s hard.
The value judgements are inevitably embedded in your cognitive biases.
Eliezer Yudkowsky: The next thing that I want to do is to take a look at the current landscape of rationality and figure out what we should do. I have a bunch of projects that I’ve been working on, and a lot of things that I’ve been thinking about. I want to ask people here what they’re interested in.
Eliezer Yudkowsky: Yes.
Eliezer Yudkowsky: I’m now pretty much convinced that the grand unified theory of rationality is correct, and at the same time that it’s utterly useless. The theory is correct because if you follow the rules, you do well; and yet the theory is useless because there’s no way to do that without following the rules. The theory is correct, but it’s useless. I suppose that’s why the theory of evolution is correct but useless.
What’s the grand unified theory of rationality? It’s Bayesian probability theory. Bayesian probability theory is the only known way to calculate probabilities from evidence in a way that can be self-updating as you learn more evidence, and yet also independent of your priors.
To learn more about this, I recommend you read my book If you’re not familiar with my book, you’ll find that there’s a lot of stuff in the book you don’t understand at all. That’s okay! Keep reading.
But it’s also okay if you think this stuff is all nonsense, and you don’t read my book. I’m not trying to force you to read it. I’ve accepted that people will believe what they want. I don’t think it’s possible to change anyone’s mind. I don’t think anybody’s mind has ever been changed on any matter of substance. I don’t think reason is persuasive.
Eliezer Yudkowsky: Maybe that’s what I was trying to do with Julia Galef on my notecards. But what she was interested in was the study of cognitive biases, so she wanted me to say more about how rationalists can deal with cognitive biases.
I am interested in tools to seek the truth, but I don’t see why you can’t have both at once. I think they’re connected. My point about the evaporation of the sense of seeking-the-truth is that if you don’t frame a problem in a way that is honest and accurate, then you’ll never get to a solution. That’s a cognitive bias: the tendency to frame things in an inaccurate manner. That’s the problem that the first eighty notecards were trying to solve. And I think the next thing to solve after that would be the fact that human brains have a very difficult time noticing their own cognitive biases.
In my view, the trick with rationality is not to get rid of cognitive biases, but to notice them. So I am not convinced that we need to be very clear on the distinction between “cognitive biases” and “tools to seek the truth”. If we can create a situation where we can notice our cognitive biases, we’ll be able to notice them without needing to give them a special name.
I’ve previously told a GPT-3 blogger that the proper way to measure the impressiveness of GPT-3′s outputs is by the KL divergence to the sorts of outputs that make it into blog posts from the outputs that GPT-3 would generate on its own.
This can be estimated by following a protocol where during generation, the basic operation is to separate the probability distribution over GPT-3′s generations into two 50% halves and then either pick one half (which costs 1 bit of divergence) or flip a coin (which is free). Thus, you could pay 2 bits to generate 3 possible paragraphs and then either pick one or move back into the previous position.
This comment did not deserve the downvotes; I agree with asking for disclosure.
It does deserve criticism for tone. “Alarmist and uninformed” and “AGI death cult” are distractingly offensive.
The same argument for disclosure could could have been made by “given that LW’s audience has outsized expectations of AI performance” and “it costs little, and could avoid an embarrasing misunderstanding”.