Arguing about LessWrong’s Engagement with External /​ Academic Research: CIRL as a Case Study

I am in favor of debating the quality and norms of LessWrong, not only with folks who write on LessWrong, but also with folks who do not.

Recently Dylan Hadfield-Menell (DHM), an academic AI Safety researcher, publicly posted some criticisms of LessWrong on Twitter. In response, there was a (heated!) exchange, primarily with site-admin Oliver Habryka (OH), about how welcoming LessWrong is for external researchers, and had a concrete case study of how LessWrong engaged with Hadfield-Menell’s work “Cooperative Inverse Reinforcement Learning (CIRL)”.[1]

Every day for 5 days (0, 1, 2, 3, 4), DHM posted a new thread of thoughts about LW (or reflections on the prior day’s thread).[2]

Many people said that they found reading the thread informative, but had missed key parts of it due to Twitter’s threading. I have here aimed to turn it into a post readable in a single-pass. Let us see whether this helps or hinders or proves too difficult to read.

Editing notes

  • Twitter doesn’t have rich text, so oftan URLs were posted raw. So I have sometimes turned URLs into hyperlinks that state the name of the linked post, for readability.

  • I have occasionally cleaned up the formatting in ways that made it read nicer (e.g. turned an in-line list of numbered points into a formatted numbered list).

  • I’m not pasting every single comment in the replies thread. For instance, many people stated positions they held, but I’m primarily here interested in digging into arguments and details, so I’ve focused on threads that go into that.

  • Often if there’s a split (i.e. two threads splitting off from a single comment) but one of the threads is v short (e.g. literally just one comment) I’ve included it as a footnote, and you can read it in the right sidebar.

  • Occasionally someone would write comments in a row, and the th comment is where the rest of the thread spawns from. I’ve often still included the nth comment in their message because it seemed like the least overwhelming way of adding it, while giving the author the respect of including what they actually said. It never seemed to me to confuse the rest of the thread.

The Discussion Begins With an Obfuscated Anecdote

This thread starts with an anecdote, but DHM cannot give details for others to verify it, so he brings to a personal example that he can talk about in much more detail about his research on Cooperative Inverse Reinforcement Learning.

DHM:

Since I’m a glutton for punishment, here’s another long-held critique I have for Lesswronger alignment folks:

Your community seems intentionally designed to prevent people with different intellectual commitments and approaches from participating.

Let me illustrate with a story.[3]

While in graduate school an economist published a paper on AI alignment. This was still early days, so that wasn’t common. He made the mistake of his work being possibly construed as indicating alignment wasn’t as hard as Eliezer thinks it is.

He made the further mistake of making a post on LW about it.

Literally no one was capable of genuinely engaging with the theory, which should be understood as saying “if x, then y” where y is a property of alignment.

Instead everyone hounded him saying but “x” isn’t true — which wasn’t clear and, in any case, it’s useful to develop theoretical intuition for assumptions that aren’t true. It’s just how you develop good theory.

He reached out to me to try to understand what was going on. I remember his question “what’s the point of this? It seems like there’s no value in engaging here.”

I couldn’t say he was wrong. AFAICT he never considered alignment problems again.

First Major Thread (primarily on CIRL’s engagement on LessWrong)

In which, primarily, DHM, Habryka, and Lawrence Chan discuss CIRL and the engagement with it on LessWrong.

OH:

Curious about the link! Seems like it would be easy for people to form their own impressions in this case, and it would reduce the likelihood of misunderstandings. (The story seems plausible to me, IMO LessWrong has had its ups and downs on this dimension)

DHM:

I responded above. I don’t think it’s my place to share in a way that’s identifiable. I’m sorry that’s it’s frustrating.

OH:

My guess is it’s largely misrepresentative then.

Look, there are many critiques to make of the LessWrong community, but it’s obviously one of the communities best in the world at distinguishing “if X then Y” and “X and Y”.

The rest of the world really sucks at this! Most academic fields really sucks at this. The media really sucks at this.

But on LessWrong, the decontextualizing culture in which “If X then Y” gets treated as a very different question from “X and Y” is one of the most common ways in which LessWrong gets criticized by parts of society that do not think this way.

I find it unlikely that LessWrong failed in this specific way. Maybe it was a low-attention post and it only attracted the attention of some small less-selected set of people, but in as much as it had participation from the people who tend to most reflect LessWrong culture, Gwern, Eliezer, members of the Lightcone team, MIRI staff, etc. I would be quite surprised if the conversation failed in the way you said.

Like, of course “If X then Y” is not a particularly valuable question to ask if X is really actually very unlikely, and maybe someone made that critique, but I don’t buy that people were incapable of distinguishing that critique from a critique of the internal logic of the argument.

Of course LessWrong is still not perfect on this dimension. This specific kind of misunderstanding and conflict is almost universal in the media, and also extremely common in academia.

The number of frustrating engagements I’ve had with academics being like “your points about how its hard to align smarter than human systems are wrong because smarter than human systems are not impossible, or are very unlikely to happen in our lifetimes” number in the dozens.

This appears to be a recurring source of confusion and misunderstanding in the world, LessWrong obviously performs above average, and you framing at least this kind of mistake as “it could practically be designed to elicit this mistake” makes me think you are unlikely to be engaging in good faith.

I might have also misunderstood what the issue was. I am responding here to what my (and my guess is also your audience’s) most likely reading is.

I do think LessWrong has tons of issues! Just this specific issue seems unlikely to me based on my models, especially construed relative to other intellectual communities.

DHM:

Well, if that’s what you think: why did this person feel the need to reach out and talk to me to understand what was happening?

Even if you think they misinterpreted the reception they got, surely you don’t think I’m lying about the meeting and their stated perception, right?

LW can be very good at separating concerns. But honestly, very bad at accepting any information that challenges certain community beliefs.

E.g., It’s very difficult to say something like “here’s a set of an assumptions and a theoretical model where alignment seems possible.”

If you want a specific example where I can provide more information, consider my experience with CIRL. The community fundamentally misinterpreted its purpose as an idea and spent more time arguing for their misinterpretation than they did trying to understand my idea.

Image

It is hard to describe how frustrating it was to 1) get interested in alignment, 2) spend time digging through the weird jargon-y literature, 3) spend lots of time doing careful work, and 4) see it rejected basically thoughtlessly.

That experience is part of why I’ve kept my distance, despite working in a similar area and having several colleagues and research collaborators that I respect who are heavily involved.

I’m not able to adhere to the community norms well enough to make engagement worthwhile.

The appropriate way to understand my work was as a language for being precise about a class of likely solvable alignment problems.

It never claimed to be a solution or that the class of problems was a realistic model of every alignment problem. It was supposed to be a tool to help the community be precise, generate ideas in a common language, and compare/​contrast ideas reliably.

Instead, I got so many arguments and responses that were fundamentally incurious and never made a real attempt to understand what the purpose of my work was.

I don’t believe my experience is uncommon. In fact, I expect this is what at least 4 out of 5 professionally trained researchers trying to engage with LW experience.

OH:

My experience with CIRL was roughly:

  1. Be hopeful about Stuart and CHAI contributing to AI Alignment (including you)

  2. Instead Stuart starts talking about CIRL solving huge chunks of the alignment problem it clearly doesn’t solve, and in a huge amount of both academic and professional correspondence starts advocating for it as “the new framework in which we should develop AI”

  3. He doesn’t respond or engage with any critiques, and when I and others talk to him directly it really seems he does not understand the problem of fully updated deference.

  4. IMO nobody ever really responds to the LW community critique, which to be clear, people really make sure actually respond to things Stuart and/​or others at CHAI have said (see e.g. “we show that a machine has a positive incentive to allow itself to be switched off if and only if it is uncertain about the human objective.” here and various details of “human compatible” I don’t have on hand but could dig up)

But also at the same time, the LW community engages with other CIRL work that doesn’t claim to solve the whole alignment problem in really a huge amount of detail:

It really doesn’t seem like CIRL was ignored or dismissed! There are over 100+ posts on CIRL on LessWrong, and at least 500+ comments. As a paper and research approach, there is huge diversity and depth of engagement here.

Overall, I wish LW had engaged less with CIRL. I continue to not see much value in it as a framing of the problem, and in terms of professional careers, it’s consumed at least a single digit percentage of the best people in the field.

I am happy to go into my critiques in more detail. I have read the CIRL paper more than 5 times, the one workshop paper I am coauthor on is on Inverse Reinforcement Learning, and I’ve had over 30 hours of conversations with Rohin and others about it. I definitely have not dismissed CIRL lightly, and I think the same is true for many people on LW.

It is plausible to me that ultimately there was a deep miscommunication here, or largely the “blame” goes to Stuart not you.

I think the actual original paper is quite measured in what it says (though in an IMO frustratingly academic way that does not highlight potential misunderstandings or provide intuitive metaphors), but I also endorse practically all the critiques that have been made to CIRL the way at least Stuart treated it as a “solution” to the AI Alignment problem, which was to be clear the load-bearing claim for most actions people advocated in relation to CIRL, and why I saw people wanting to work on it academically.

It is still plausible miscommunication occurred and I am confused, but at least I spent 50+ hours engaging in-depth with this, going both through the proofs, writing dozens of comments, talking to Rohin Shah and Adam Gleave, and that is the overall impression I walked away with.

DHM:

A few comments.

  1. I’m not Stuart;

  2. Your comment indicates that you don’t understand the value of the idea, I’m sorry;

  3. I don’t think you understood Stuart’s communicative goals with the public.

I’d be more sympathetic if the response was something like “well, yes, goal uncertainty is important but clearly not enough” and then LW did lots of work on preference uncertainty and how to ensure it is well accounted for in AI system behavior.[4]

But the response was basically to use the framework so say “see, this work is a dead end, don’t do it”.

As a result, there’s shockingly little connection to research on optimal behavior under partial observability. A theoretically justified aspect of alignment.

But these responses aren’t good. E.g., Ryan Carey’s paper is just a long complicated way to introduce a new set of assumptions for the interaction. It wouldn’t get past peer review anywhere and shouldn’t.

It was always ok to say the assumptions are unrealistic.

I would never push back on that, neither would Stuart.

But you don’t make progress on theoretical problems by refusing to consider useful simplifications. In fact, an inability to consider incorrect but useful simplifications is a good way to do bad work.

And, as I’ve said elsewhere, CIRL is a very hard problem class! It’s in a very challenging complexity class even with the simplifications!

I guess it’s helpful to understand that I was collateral damage because you wanted to police Stuart’s public communication.

OH:

Look, man, CIRL had far far far more people working on it, and seriously working on it, than I think was justified based on just the natural promisingness of the idea.

CHAI attracted a great number of very smart people in the LW orbit, and then caused them to work on stuff that most of the time they in-retrospect thought was not a great use of their time.

Complaining about CIRL being dismissed by the LW community feels crazy. It’s one of the most engaged-with ideas in the whole space! There are so many research directions. This one had like 5+ full-time researchers working on it for many years!

What possible thing could the LW community have done to make you feel engaged with? Drop everything else? Publish 200 instead of 100 posts on the topic? Never say anything critical of the theory? Never display any slightest hint of misunderstanding when engaging with it?

Yes, many people aren’t sold on the idea, but many many people worked on it, and engaged with it seriously. I think it isn’t a particularly promising direction. Some people do think it’s a promising direction, though I think it’s been going down.

CIRL is probably in the top 20 of research directions that LW has engaged in-total, including in depth empirical and theoretical work. It clearly wasn’t dismissed fully.

I am also sure some people misunderstood it and critiqued it on wrong grounds. So have people in academia.

DHM:

I can only reference the interactions I had after publishing the paper. They were adversarial and largely reflected substantial misunderstanding.

I can not tell you how many times I tried to explain that CIRL was a language for analyzing properties of alignment solutions, not a solution itself. I’m pretty good an explaining my work. LW folks were noticeably worse at understanding if they ever got to that point.

OH:

I have critiques of that framing, but also, I think your coauthor, who is the much more prominent coauthor and like published a whole book about it, I do think did not generally stick to that framing, so reacting to it this way seems quite natural to me.

I haven’t seen you write or comment anywhere with clarification to this effect, so people not picking up on the divergence here isn’t that surprising, though still sad.

> I can not tell you how many times I tried to explain that CIRL was a language for analyzing properties of alignment solutions, not a solution itself.

Just to be clear, did you ever do this on LW? There should just be a comment thread you can link to where this happened, and would again allow us to be much more grounded.

If you never commented on LW, then IDK, that is sad, but also kind of a confusing ground on which to complain about the LW community.

Lawrence Chan:

Idk, I do feel a bit sad about this, because Dylan did explain this to me several times (including when I first met him ~2017) and I never wrote this up. And I’m aware that he’s explained this to several other people who are, if not part of the community, at least frequent LW.

The closest thing I recall being written about this was Rohin Shah talking about how CIRL is “math poetry” pointing at a desirable property we might want AI to have (in this case an elegant form of indirect normativity) in the same way that CEV is natural language poetry gesturing at another form of indirect normativity. He wrote about this in multiple places, but to my knowledge it was most clearly put in his “reward uncertainty” post in his value learning sequences:

“CIRL is not meant to be a blueprint for a value-aligned AI system. It is not the case that we could create a practical implementation of CIRL and then we would be done.

[...]

CIRL is supposed to bring conceptual clarity to what we could be trying to do in the first place with a human-AI system.”

https://​​www.lesswrong.com/​​s/​​4dHMdK5TLN6xcqtyc/​​p/​​ZiLLxaLB5CCofrzPp

It’s worth noting that Rohin _also_ links to a transcript of an FLI interview w/​Dylan where Dylan does make this point explicitly.

(Dylan:) “And so really what cooperative IRL, is it’s a definition of how a human and a robot system together can be rational in the context of fixed preferences in a fully observable world state.”

But AFAIK there was little discussion of this point on LW, and the fact that Rohin had to mention it explicitly in his value learning sequence is evidence that the misconception was widespread.

https://​​futureoflife.org/​​podcast/​​cooperative-inverse-reinforcement-learning-with-dylan-hadfield-menell/​​

Idk, I think there was a lot of good thinking inside of CHAI about CIRL, and people really were ahead of the curve in some aspects of conceptual thinking in 2017-9. A lot of it never made it out, and this makes me a bit sad.

Also, while I think CHAI didn’t try very hard to communicate (outside of DHM and Rohin, but including myself) think MIRI deserves part of the blame for their relative uncharitable handling of Dylan’s CIRL + Off-Switch Game papers, which I think didn’t help with the communication difficulties. (I’m thinking both of in-person interactions as well as the way a lot of the CIRL stuff was reported on in LW/​Arbital.)

That being said, I also think Dylan is absolutely correct that Stuart’s statements were somewhat heterodox in CHAI even when Dylan was there, and I think the gap has only grown wider since. See e.g. Stuart’s comments with Eliezer in Scott’s Assistance Games writeup

https://​​www.astralcodexten.com/​​p/​​chai-assistance-games-and-fully-updated

And then my reply in the comments, which I shared with the CHAI grads at the time and got broad agreement. Also, as I said, “another CHAI grad student did an informal survey and found two-thirds agree more with Eliezer’s position than Stuart’s in the debate.”, though as I said, I’ve also found that “Stuart’s actual beliefs are quite a bit more nuanced than what’s presented here” (and, I’d add today, unfortunately far more nuanced than what he often says in public even today).

https://​​www.astralcodexten.com/​​p/​​chai-assistance-games-and-fully-updated/​​comment/​​9549876[5]

Damon Sasi:

Thanks for writing all this up. Just to be clear, it seems like Oliver’s points remain mostly correct?

1) LW did engage quite a lot with this idea.
2) Stuart spoke much more about it, in a frame different from Dylan’s.
3) Dylan did not himself try talking on LW directly on it.

DHM:

Honestly, why would I talk about it on LW?

I was speaking about it in person with a *lot* of LW people about it. Oliver says he spent 50+ hours studying it. I’m sure I spent a similar amount explaining it to LW folks.

I know how hard it is, and I’ve watched how exhausting it can be to engage as someone who doesn’t adhere to all the beliefs of the community.

E.g., why would I post somewhere that has comments like this from key figures when academics engage.

Like I said, my understanding is that they perceived the idea or the associated messaging as claiming that CIRL solved alignment. Oliver says that ok because that’s what Stuart says in public books. He admits that this isn’t in the paper about the idea.

In academic communities, once the paper is written, it’s fixed. It doesn’t usually matter who wrote it except in some fields like Law.[6]

My guess is it’s rationalization of too much deference to others in 2016/​2017 combined with a very strong belief that the way to communicate this to the public is doom and gloom.

The doom and gloom is unavoidable given the core beliefs of the group that adv AI must embody perfect rationality imply alignment is basically impossible to solve.

However, and they don’t talk about this as much now, they also felt a deep belief that adv AI must be built.

So they were pretty trapped and very jealous of people communicating something else.

IMO, the real takeaway from CIRL is that if someone proposes an alignment solution they should be able to discuss how it maintains appropriate goal uncertainty.

But goal uncertainty isn’t compatible with the adv AI being a perfectly rational utilitarian in their conception.

So, there just wasn’t anything productive to discuss. I’m not going to get anywhere arguing with pseudo-religious beliefs.

Yudkowsky basically said this to me in person. I also explained the idea to him when I went to present it at a workshop.

At the time I told him that under his assumptions the problem probably can’t be solved. I think this paper shows that pretty persuasively.

arxiv.org/​​abs/​​2102.03896

OH:

IDK, it’s just a weird critique to be like “LessWrong community, I have had this experience engaging with you, by which I mean, I had this experience not actually engaging with LessWrong at all, the experience was with other people off of LessWrong, in a different context, and I got a vibe from some of LessWrong that I would have similar experiences there”.

Like, to be clear, that’s fine and I think it’s totally valid to make likely inferences about what would happen to you on LW based on interactions somewhere else, but as someone who runs LessWrong and moderates the discussion actually happening on the site, this critique lands very different for me. I know much less what people do in private conversations. My sense is the variance in those contexts is huge, and I’ve seen things go both very well and very badly.

I do really know a lot about what happens in LessWrong comment sections, and I think if you had shown up, people would have engaged with you in a lot of depth, and quickly updated on what you said. And people were correctly responding to what Stuart is and was saying, and I think you just weren’t really a participant in the conversation.

The following 6 threads split off of different parts of the above conversation. They each open with a quote of what they’re replying to.

First Subthread (on engaging with DHM’s vs Russell’s framing of CIRL)

DHM:

I can not tell you how many times I tried to explain that CIRL was a language for analyzing properties of alignment solutions, not a solution itself. I’m pretty good an explaining my work. LW folks were noticeably worse at understanding if they ever got to that point.

OH:

I have critiques of that framing, but also, I think your coauthor, who is the much more prominent coauthor and like published a whole book about it, I do think did not generally stick to that framing, so reacting to it this way seems quite natural to me.

I haven’t seen you write or comment anywhere with clarification to this effect, so people not picking up on the divergence here isn’t that surprising, though still sad.

(1)

DHM:

It’s the straightforward interpretation I would expect anyone who has advanced beyond their first year of PhD to understand directly from the paper. It’s what we mean by framework.

Re: Stuart’s public communications, he was raising awareness about a problem you claim to care about, which was good for you because you’re all remarkably bad at public communications.[7]

OH:

But your coauthor has clearly meant a different thing by it, so I don’t understand?

Even if it was obvious (which, I don’t think it is), if I publish a paper in which I talk about an approach to nuclear deterrence, which I think is probably a bad idea to implement but has some interesting internal logic, but my coauthor goes around and keeps telling everyone that we solved nuclear deterrence and the US needs to adopt it right now.

Then I surely can’t be surprised if then people keep arguing with me that the approach my framework suggests for nuclear deterrence isn’t actually a good idea. And I surely can’t react with outrage if people infer that I do believe that the approach is good, given that my co-author, who has a platform 100x greater than me, keeps saying it does!

Like, of course you will have to clarify. How are people supposed to magically know?

DHM:

I didn’t react with outrage—I just kept doing my work. Only now, a decade later, do I mention it to support the point that it’s common for newcomers with a research background to feel unwelcome in your community.

OH:

Sure, that word was hyperbole, but you do clearly consider yourself aggrieved. But if I understand things correctly, you never commented about this on LW once?

Like, I just searched through LessWrong, and I can’t find any conversation you had with anyone on LW about this, so I am now actually just very confused what you are talking about.

There is of course a broader LW community with lots of private connections, but come on, complaining how people misunderstood you, without trying a single time to clarify it on the very platform you are critiquing does feel weird. If someone showed up to an academic conference and complained that nobody understood the blogpost they published, you clearly wouldn’t consider that a valid critique.

I might also be totally wrong here, but I haven’t been able to find any comment threads about this in my 5-10 minutes of searching. Our search isn’t amazing, so totally possible I missed something.

(2)

DHM:

It’s the straightforward interpretation I would expect anyone who has advanced beyond their first year of PhD to understand directly from the paper. It’s what we mean by framework.

Lawrence Chan:

I agree that this is a straightforward interpretation (and also my interpretation), but I will point out that many at CHAI did not understand this!

E.g. I think that Rachel’s writeup here seemed to analyze CIRL as a corrigibility solution: CIRL Corrigibility is Fragile

I will also point at many academic papers following up on your work, that treated CIRL more like an algorithm and less like an analysis framework.

Even at CHAI some papers treated it more like an algorithm. From a quick skim at CIRL-related pubs in CHAI from 2017-2019:

  • I think IRD, Off-Switch, Should Robots be Obedient, and later Assistive Bandit, Incomplete Contracting + AI Alignment, assistance vs reward learning, and (if you count it as CIRL) Silly Rules fall clearly on the “CIRL as framework” side.

  • I think that the generalized bellman update for CIRL is closer to algorithm, though maybe you count it as theory?

  • Active IRD and Divide-and-Conquer seem pretty much in the realm of “CIRL as algorithm”

But in general I think many people at CHAI did buy into Dylan’s CIRL-as-framework take.

Of course, the problem was much worse outside of CHAI, I think the rate that peer review understood this distinction was embarrassingly low:

  • The assistance vs reward learning paper was famously a framework paper that got treated as an algorithm paper and correspondingly trashed during reviews

  • IIRC the reviews on the AMAB paper we worked on had some decent amount of “there’s no algorithm here, you’re just using PPO with deep nns to brute force solve a POMDP, what’s the contribution?”

  • I wasn’t around for the CIRL paper but I’d be surprised if the reviewers didn’t care a lot about CIRL admitting an efficient update rule that let you solve underspecified POMDPs than the framework being an interesting way to discuss AI Alignment in general.

I agree that substantial effort went into clarifying this to the reviewers—e.g your multi-principal assistance game paper’s abstract literally starts “Assistance games have been **proposed as a model** for beneficial AI, wherein a robotic agent...”. I think the misinterpretation has more to do with the norms of ML reviewers than it does a lack of effort on your part. But the misinterpretation existed nonetheless.

Second Thread (w/​ Lawrence Chan, on LW’s engagement with CIRL)

Damon Sasi:

Thanks for writing all this up. Just to be clear, it seems like Oliver’s points remain mostly correct?

  1. LW did engage quite a lot with this idea.

  2. Stuart spoke much more about it, in a frame different from Dylan’s.

  3. Dylan did not himself try talking on LW directly on it.

Lawrence Chan:

I agree with 2) and 3), less so with 1) — LW engaged a lot with this idea but not super well/​charitably. Eg my guess is the median commentator didn’t read Dylan’s paper, at least without substantial priming against it (Oli and a few other high engagement LWers did). And a lot of LW upvote/​comment dynamics are driven by median commenters (the review is notably less so, and I think suffers less from this).

But also, probably the convo could’ve been substantially improved with ~25 hours of work by someone like myself actually doing the work Dylan proposed and doing a CIRL analysis of eg IDA or RRM. One thing I’ve learned over the years is that communication is super duper hard, especially without concrete examples — you can say in 80% of your papers that it’s a framework that’s useful for analysis but if you don’t use it to analyze what the LWers do, most LWers won’t get it

I mean, this isn’t a LW specific complaint, I read ML papers all the time where the authors clearly didn’t read the citation (or more often, didn’t read past the abstract). And it’s not like saying in 80% of papers that your thing is a framework stops academics from responding to it as an algorithm either 🤷 it’s more of a place where I think LW fell short of its aspirations, even if it did so in ways that mirrored others

OH:

I am confused about 1). For example, while not directly on LW, Scott’s engagement with both Eliezer and Stuart feels like quite real and serious engagement.

Again, it’s not engaging with Dylan’s framing of things, but IDK, Stuart was a coauthor of the paper, is more broadly respected, and was making broader claims. It makes sense to engage with him more.

Do you maybe mean “LW didn’t engage with Dylan’s perspective on CIRL?” In that case, I don’t know how it could have.

It appears written down nowhere, except maybe in a summary one podcast posted to LW but not directed at anyone on LW, and my guess is we both think the responses to CIRL directed at Stuart’s framing of the problem were largely on point (if critical), as you write yourself in your ACX comment.

Lawrence Chan:

Yeah, I meant, I wish we engaged with the better framing and not just the worst one 🤷

I’m mainly a bit sad from a personal perspective, because it feels like the sort of thing I would’ve proactively fixed in 2022-3 but I never tried arguing for it in 2019-20 when it was more relevant (and by the time I thought to draft something in 2022 it felt like the moment had passed)

Not saying it’s your/​LW’s fault to be clear, I agree that it seems like a brutal ask for either Dylan or the LW team to unilaterally fix this

OH:

Yeah, I don’t know how hard it would have been to fix this. My guess is a medium-sized LessWrong post would have produced engagement of that type, so as you say, like 20-30 hours of work. I do think people were primed by Stuart’s framing of the problem, and it would have been work to overcome that.

I also don’t know how much Dylan’s framing of the problem makes more sense. You seem more excited about it, I am curious, largely for postmortem reasons, about your perspective on it. If it is/​was a very meaty thing that was under-engaged in, I would really like to know that, even if I have trouble seeing it going any other way previously, at least with simple changes.

Third Thread (on academic norms about authorship)

OH:

> In academic communities, once the paper is written, it’s fixed. It doesn’t usually matter who wrote it except in some fields like Law.

Man, this is just an obviously made up statement, and I am frustrated by you repeatedly making up random norms about what is true in academia in this conversation.

I am not an idiot, neither are other people in this conversation. Of course in academia it matters hugely what an author of a paper says about the paper in contexts other than the paper. I read critiques of academics who use people’s tweet threads about their paper about potential misinterpretations all the time. I read blog posts by AI labs that use what other labs wrote in their blog posts about what is going in their paper all the time.

You are just totally making up random facts about academic norms here. Academics are just normal people. Of course academics react differently to a paper if its most prominent author repeatedly argues that the paper means one thing, even if the paper itself doesn’t say that. This is not a weird LessWrong thing. It’s an “everywhere” thing.

DHM:

[Asks Grok]

Grok’s answer is clearly true.

I even primed it in your favor.

It is a basic academic norm that you evaluate the work without considering the author. It’s the entire reason behind blind peer review.

Again, basic training in professional research would teach you this.

Damon Sasi:

No, see, notice how Oliver said “what’s said about the paper” and you focused Grok on “who authors the paper?” Because I noticed that.

DHM:

He quoted me and said I made it up. I very much didn’t.

I don’t know how to prove that this is common knowledge and a norm. This was my attempt.

Damon Sasi:

Read his comment. He was clearly responding to “Once the paper’s written, it’s fixed.”

You focused Grok on the notion that the author is irrelevant. But Oliver’s point was that your notion that nothing else written or said about the paper should matter.

DHM:

How did I focus it on that? I took the statement he said was made up and did the following: “I don’t like this idea. I think it is made up. [direct quote]”. That was intended to stack the deck in his favor.

Damon Sasi:

Seems like this is even less likely to get productive from here. I’m going to step away from all this and get a cup of water and think about other things for a bit.

Some friendly advice from a stranger, maybe do the same?

OH (replying to the above DHM message):

“It doesn’t matter who wrote a paper in academia in all fields except law” is obviously a crazily wrong statement about academia.

Let’s throw it into more LLMs if you think that’s compelling:

It’s a thing academia sometimes aspires to, though it’s a kind of controversial issue. Authorship is of course useful, and asserting the whole academic community is author blind seems obviously disconnected from the reality of academic discourse

Fourth Thread (w/​ Tim Hua, on academic norms about authorship)

Tim Hua:

> It doesn’t usually matter who wrote [the paper]

This seems totally false

DHM:

In academic communities it is the goal to consider the idea separately from the author.

Anyways, he’s justifying a bad interpretation of my work because he listened to my co-author’s public communication instead of understanding the paper itself.

I’m not saying that I deserved attention from LW — I never posted on LW.

I’m saying that I don’t deserve aggressive negative attention based on a failure to understand the work.[8]

OH:

Who was commenting on just the paper ever anywhere?

People responded to “CIRL”. I don’t see any discussion where people are like “CIRL, specifically the CIRL as defined in Dylan’s paper, and not the in all the other communication about it”.

Maybe that happened somewhere. It seems like a very understandable mistake to make, but just because someone critiques “CIRL” doesn’t mean they have to critique the one paper that was written on it that happens to be the one thing that you think is most defensible, if that’s not the thing they actually want to talk about.

DHM:

The arxiv paper you shared, for one.

Also in basically every first conversation I had with any LW researcher for like 5 years.

I’ve interacted with a *lot* of people in your community. Not because I choose to go to the blog. I actually chose not to. Just because they were at all the conferences and workshops I was at.

First Subthread

Tim Hua:

> In academic communities it is the goal to consider the idea separately from the author.

Sure, but that doesn’t mean that this is what actually happens.

[I’m not trying to address the other parts of the argument, just pointing out the this is a locally invalid point]

It’s like saying “in rationalist communities the goal is to solve x-risk” and using that to absolve any criticism.

Anyways, local validity of arguments is really really important. Imo it’s one of the key ways to prevent yourself from going insane.

https://​​www.lesswrong.com/​​posts/​​WQFioaudEH8R7fyhm/​​local-validity-as-a-key-to-sanity-and-civilization

DHM:

I don’t understand. I participate in academic conversations about research for my job. Are you saying I don’t know how I discuss work?

Like, my point is that my idea should be responded to based on the paper we contributed to the literature.

That’s how academic research works. You’re not supposed to need to know the authors to understand and evaluate the work.

Sure, there’s lots of dynamics in practice, but this ideal is pretty basic and largely adhered to in my experience, except fields like Law, as I said.

Second Subthread

Tim Hua:

>Are you saying I don’t know how I discuss work?

No I’m not trying to say that at all.

I’m saying that when you said the sentence “In academic communities, once the paper is written, it’s fixed. It doesn’t usually matter who wrote it”

Is empirically false.

“my idea should be responded to based on the paper we contributed to the literature.”

I totally agree that this should be the norm.

DHM:

Oh, well that’s very much what I meant. Sorry it wasn’t clear.

FWIW, I think it should have been clear from the context of a conversation where I’ve repeatedly said that the response to the idea shouldn’t have been about Stuart’s public comments but the paper we wrote.

Third Subthread

Tim Hua:

> You’re not supposed to need to know the authors to understand and evaluate the work.

I agree

> this ideal is pretty basic and largely adhered to in my experience

I do not believe this is true broadly. I know that it is definitely not true in economics

DHM:

I can only speak to my experience for something like this. I think it’s roughly true for CS. I think the dynamics of modern attention severely undermine this norm in practice fwiw.

Fourth Subthread

OH:

> That’s how academic research works. You’re not supposed to need to know the authors to understand and evaluate the work.

> Sure, there’s lots of dynamics in practice, but this ideal is pretty basic and largely adhered to in my experience, except fields like Law, as I said.

It is obviously extremely not adhered to in practice. I don’t know what academic spaces you hang out in, but I really don’t think you are representing what is going on in academia accurately at all.

I am happy to ask a bunch of people with lots of academia experience here. I would be very surprised if they agree with the statement “academia largely ignores authorship in evaluating or discussing academic work”.

DHM:

Well, there’s blind peer review as the dominant evaluation mechanism. So, despite all the problems with the process, it’s clear there are costly commitments to this goal.

But I don’t care what you think about whether my experience is generally true.

This has been pretty silly as an argument.

My claim was that I’ve found your community unwelcoming. You are reinforcing my belief in that claim.

OH:

Again, do please take more care what statements you make about academia. Double blind peer review is far from the dominant evaluation mechanism, and its prevalence at all is also very recent. It is most common for reviewers in most fields to see the authors identity.

DHM:

I just ran an AI alignment track with a double blind peer review, so I don’t know what you think you’re accomplishing.

I don’t *care* about what the broader norm is. I wish that *your community* would give me the same respect I give you.

Which, to be clear, is that you judge my ideas based on the papers I write, not the general audience books I didn’t write.

(1)

OH:

> Well, there’s blind peer review as the dominant evaluation mechanism.

Look, I just responded to this local claim above, which is IMO clearly false.

Yes, I understand that you think there is virtue in ignoring authorship. This is not a universally accepted principle in academia, and also not one I believe in.

You’ve made 3-4 false statements about what academic norms are. I don’t really know why you are doing it, but we are in the middle of a long conversation about whether LW and associated communities don’t have appropriate respect for “academic norms” and in order to discuss that properly we clearly need to agree on what academic norms even are.

DHM:

All I can say is that every academic community I’ve engaged with has had double blind review as a process at many journals.

But, fine, you win. Academic norms are worthless. I’m just bitter. You’re actually a super friendly and welcoming community. This has been a joy

And, to be clear, all I said was “in my experience” — so you misinterpreted me again.

(2)

OH:

> I don’t *care* about what the broader norm is. I wish that *your community* would give me the same respect I give you.

> Which, to be clear, is that you judge my ideas based on the papers I write, not the general audience books I didn’t write.

I will judge your ideas based on all the context I have available to judge you! Of course I will not judge your ideas on the basis of books you didn’t write, but I will judge the ideas and arguments of your coauthors.

As does everyone else.

I will judge your ideas on the basis of tweet threads, on the basis of your books, on the basis of your reputation among peers, and on the basis of spoken conversations.

Those are all useful evidence what your ideas are! Ideas do not live in some magical magisteria of academic papers that no other medium is allowed to touch.

DHM:

Oliver, you’re justifying why it’s ok that your community spent a bunch of time arguing against a version of my idea that I don’t endorse, never endorsed, and, as Lawrence documented, did clarify and explain several times in places that your community should have read.

And that isn’t even relevant to my original claim: that I found the reaction unwelcoming.

This is ridiculous.[9]

OH:

I don’t think I was responding to the paragraph about “your experience”?

At least I don’t see how that would connect. You made a universal statement about peer review and author-blindness in academia.

I am not here to make you feel welcome. You made some arguments in public I thought had holes in them, so I commented on that. I think that’s generally a good thing to do and how discourse proceeds. I think if you don’t want that, that’s fine, I will still comment for the sake of the public record but you don’t have to engage.

> as Lawrence documented, did clarify and explain several times in places that your community should have read.

I don’t think Lawrence commented that at all! There was a single podcast in which apparently you made a statement like that, but it wasn’t even clear that in the other CIRL discussion people were responding to you as opposed to Stuart, who clearly is using these ideas in different ways.

We can ask Lawrence whether he felt like people “should have read” the summary of the one podcast where you clarified this, and whether this even would have made sense to do given that it’s unclear whether you as only one author among many should have authority on what the truth about concepts is and how people use them and how they fit into arguments that other coauthors make.

DHM:

Oliver, the tweet you responded to was that I think your community regularly makes people like me feel unwelcome.

I take “I am not here to make you feel welcome” as admission that you agree with my initial critique.

I’m disengaging now.

OH:

Farewell. Thanks for the detail you did provide. I do think it’s useful context to help people judge things for themselves, and I might compile it into a more readable format.

(And of course, to be clear, I do not agree with your initial critique, though I do think there are things in the space that are true. You framing this as “admission” feels like a weird rhetorical move. Of course neither of us is here primarily to make the other feel welcome. Other things are much more important to us in this conversation.)

Fifth Subthread (w/​ tutor vals, on governance and rudeness)

This thread doesn’t get very far. It’s plausibly bad to include because it mostly shows rudeness, but something about that seemed informative.

DHM:

A few comments.

  1. I’m not Stuart

  2. Your comment indicates that you don’t understand the value of the idea, I’m sorry;

  3. I don’t think you understood Stuart’s communicative goals with the public.

OH:

Well, then it’s definitely not for lack of engagement or trying!

At this point my guess is if I still don’t get “the value of the idea” after spending 50+ hours on it, my guess is the problem is not with me.

I do agree I don’t understand Stuart’s engagement with the public. It’s been very confusing for me for a long time.

tutor vals:

Oh, knowing that a bunch of this conversation is predicated on examples like the usefulness of Stuart’s work does update my position to be less pro academia here, as they are an example of academic work that doesn’t seem to contribute to reducing ai x-risk much.

I think it’s fine for LW to have an ai safety community that cares about reducing ai x-risk and not about solving technical alignment problems, and thus to not engage on particular techniques if overall they don’t fit the picture of reducing risks (for example because inapplicabl

So I can appreciate that academia might bring a particular rigor to the topic while still having to politely reply “thanks for your work but this doesn’t address the thing I care about”

DHM:

We agree. We got here because I provided some advice about being more welcoming to people with different intellectual backgrounds (e.g., academics working on similar problems).

I provided an example that wasn’t me, but where I didn’t feel comfortable sharing details. Then Oliver jumped in and said he needed details. I couldn’t provide those, so I used myself and my work as an example. A few other people with similar experiences chimed in.

tv:

Ty for clarification, I do believe people get bad reception and that the example you can’t share more on would clarify that.

DHM:

You’re welcome to your opinion about the work, but I’m curious what you don’t like about it.

Can I answer any of your questions or critiques?

tv:

I am not aware of your personal work much so I looked through a few titles and abstracts of your papers recent and generally feel like they contribute.

Re specifically the 2016 CIRL paper, it’s probably in the average of contributions for its time (many other work there would end up not generalizing to be useful in the DL prosaic agi world), I don’t have a critique of doing it at that time, but I would not be enthusiast, were I a grant maker, about funding more work in this area right now, because right now my model of ai x-risk mostly predicts we reduce it with better governance than technical alignment solutions. More particularly though, I think CIRL type stuff is of limited use because that’s not how current frontier ai models are trained and thus in practice won’t be used to align AGI.

DHM:

Let me just say that it’s quite clear you don’t understand the idea or the contributions of the work.

I offered to engage, but you decided not to. I don’t think you’re a serious person.

Have a nice day! Good luck with your work

tv:

And a good day to you to.

Fwiw, I’m not a grant maker, and would do work of evaluating more seriously indeed if I were.

DHM:

That’s fine, I guess. In my community, if you’re gonna denigrate someone’s work directly to their face, you’re expected to be able to provide at least one or two interesting questions/​comments that could serve as a touch point for future discussion.

This is one of those pesky professional/​academic research norms that you don’t value.[10]

OH:

(Seems like a bad norm and like you were kind of being a dick here. You asked them for their critiques!

I don’t buy it’s an academic norm to call people unserious in the way you are doing it, I think it’s just a kind of dick thing to do everywhere, and I would be happy to take bets on this with people who have experience in academia)

DHM:

That’s not genuine engagement with the work!

Obviously.

I will do tit-for-tat.

I’ll add that, outside of this context I would normally use this as an opportunity to try to explain why their response doesn’t make sense and engage productively. In fact, if it had just been respectful I would have.

OH:

It’s a totally respectful response! What is the issue with it? It just calmly explains their perspective, and you asked them for follow-up, and then suddenly you call them unserious and end the conversation.

Like, of course feel free to disengage, but I feel like you just randomly insulted a random person who offered their thoughts in a reasonable and thoughtful manner (of course expressing disagreement, but not in any particularly harsh tone).

My current best guess is you are frustrated by some of the rest of the thread, and tutor vals is kind of caught in the crossfire.

I think it’s reasonable for you to be annoyed at me, as I’ve made various complaints in more aggressive ways, but I feel like something weird is going on here in your response to tutor vals, and I wanted to point it out, but I’ll leave it be for now.

DHM:

I think the rudeness was more their initial post when they joined the thread.

I think the most recent one was an evaluation, not a question or a critique. It wasn’t what I offered to respond to.

Sixth Thread (w/​ Jan Kulveit, critiquing LessWrong)

OH:

Look, there are many critiques to make of the LessWrong community, but it’s obviously one of the communities best in the world at distinguishing “if X then Y” and “X and Y”.

The rest of the world really sucks at this! Most academic fields really sucks at this. The media really sucks at this.

But on LessWrong, the decontextualizing culture in which “If X then Y” gets treated as a very different question from “X and Y” is one of the most common ways in which LessWrong gets criticized by parts of society that do not think this way.

I find it unlikely that LessWrong failed in this specific way. Maybe it was a low-attention post and it only attracted the attention of some small less-selected set of people, but in as much as it had participation from the people who tend to most reflect LessWrong culture, Gwern, Eliezer, members of the Lightcone team, MIRI staff, etc. I would be quite surprised if the conversation failed in the way you said.

Jan Kulveit:

I think “the people who tend to most reflect LessWrong culture, Gwern, Eliezer, members of the Lightcone team, MIRI staff, etc” often suck at this. To give you a specific example without the need to engage in “how to interpret evidence we do not see”

https://​​lesswrong.com/​​posts/​​H5iGhDhQBtoDpCBZ2/​​announcing-the-alignment-of-complex-systems-research-group?commentId=frEufx3c6cRmhDjbh

Overall I’m happy to defend in this case

  1. Eliezer fails to engage in “If X then Y” style of response

  2. The claims he made aged poorly

  3. The style is arrogant-dismissive (“Just to state the reigning orthodoxy among the Wise”)

It seems easy to imagine someone just disengaging if they got this style of response on a first post, so I find the original story plausible.

OH:

I think Eliezer is being somewhat of a dick here, but also isn’t making the error in question. I do also find a bunch of the framing of it annoying.

JK:

I think there are multiple problems, but also the one discussed (?).

The premise of the post in questions is

“The standard single-interface approach assumes that the problems at each alignment interface are uncoupled (or at most weakly coupled). All other interfaces are bracketed out. …

Against this, we put significant probability mass on the alignment problems interacting strongly.” (this is X) Other parts are “Y”.

Conditional on X being true (the problems at different interfaces do in fact interact), you may argue the conclusions are wrong, and for example, best thing to do in the space is what is now known as “model organisms”. Or, some technical demo which would convince world governments; or hardening labs against internal takeover, etc etc. I see some “not X” or “bad frame” but waht I do not see in that response is serious engagement with “what if X”.

OH:

I find it a bit hard to tell because it was just a single comment. My current sense is Eliezer is making a claim here that makes sense within the framework of the post, and is not just denying an assumption of the model in an unclear way (though again, it’s reasonable to critique assumptions of models, as long as you are being clear that that is what you are doing).

Like, within the framework of the post it makes sense to try to distinguish what makes different boundaries between different systems different. Eliezer is saying “well, here is this one way this one boundary is very different from all the other ones”.

To be clear, he is saying it in a very dickish way, and I am not a fan of the framing he used, but the issue here feels different from the framing in this thread (if the critique had been “various people on LessWrong, including core contributors, not too infrequently overclaim that things are ‘obvious’ or ‘widely known to be false’ in a kind of social-slapdowny way” then I think that’s pointing towards a much more real thing (though it’s unclear how much LessWrong is worse than other places, though it’s a huge issue on LessWrong, even if it’s also a big issue other places).

Seventh Thread (w/​ Michael Cohen, on his negative experiences posting his research on LessWrong)

DHM:

It is hard to describe how frustrating it was to 1) get interested in alignment, 2) spend time digging through the weird jargon-y literature, 3) spend lots of time doing careful work, and 4) see it rejected basically thoughtlessly.

Michael Cohen:

My version of this was (and is) a complete lack of curiosity from the LW community about an agent I designed for which there is literally a proof that it would consider human extinction maximally bad. People seem desperate for reasons to consider this an unpromising direction.

The agent is intractable! There is an unrealistic realizability assumption! Poor data scaling laws!

And there is just a total lack of interest in whether those might be surmountable. And a failure to “update” after observing a result they would not have predicted possible.

(And I know they would not have predicted this kind of result to be possible, because it is *still* conventional LW wisdom that results of this kind are impossible).

OH:

Huh, do you have a link?

I feel like I’ve quite liked your contributions to LW over the years, and I feel like most others have too, and remember many good conversations (like this one: https://​​lesswrong.com/​​posts/​​CnruhwFGQBThvgJiX/​​formal-solution-to-the-inner-alignment-problem#xBaKLMe9CxmHHdYS6)

Would be curious in learning more if you have a link to when things went wrong.

Michael Cohen:

I appreciate your saying so. To be clear, I don’t feel I was personally treated rudely. I feel that an idea I developed didn’t garner much interest. (And like Dylan’s story, I suspect that was in part because accepting it would entail a lowering of p(doom).)

It feels a bit awkward complaining about people not getting interested in my work, which is why I haven’t done so before, but when there is literally a theorem about not causing human extinction, it made me think there were pretty big flaws in epistemic practice.

I guess I should link: https://​​michael-k-cohen.com/​​post/​​pessimism

Paper is linked in the post

OH:

Makes sense! My guess is in this case it was more related to the paper being very dense (as you warn yourself in your LW crosspost), and less related to lowering of p(doom). Almost all work that increased p(doom) at this level of density also doesn’t get much engagement. I might try digging into it and see whether I can give a presentation of it that is more approachable, and maybe then we will see more engagement with it! I feel like historically there has been good engagement with both yours and Hutter’s work on the site, and I’ve been a big fan of both of your work, and so feel optimistic that there is a way for LW to provide more value in that. (For reference, the LW crosspost: https://​alignmentforum.org/​posts/​RzAmPDNciirWKdtc7/​pessimism-about-unknown-unknowns-inspires-conservatism…)

Eighth Thread (w/​ Damon Sasi, on LW engaging with Stuart Russell’s writings and whether that negates DHM’s criticisms)

Damon Sasi

*shrugs* I dunno man, I guess we just parse postmortem mentality differently or something?

If your primary complaint is that people *on LW* did not understand it properly, it does not make much sense to me to say that there was no point in *posting on LW* yourself about it.

Like *my* takeaway is that if I want a group of people on a forum to talk about my work accurately, I should:

  1. Make sure to put my version on their forum in a legible way that’s easy to reference.

  2. Make doubly sure to do that if my coauthor is saying something different.

Otherwise your complaint seems to repeatedly be “they were just too stupid/​lazy/​jealous/​etc” and this is a really bad explanation.

I get why it makes you feel better, but it doesn’t make you or your position more sympathetic, imo. It looks like you have an axe to grind.

DHM:

I didn’t say any of those things. In fact I’ve been very careful to not say those things.

I’m describe a decade of discussions where my primary takeaway was “this person doesn’t understand my idea.”

My understanding is the community collectively misunderstood the idea in basic ways.

Let me ask you this: do *you* understand the idea?

And, to be clear, I didn’t really want them to engage.

I was mostly indifferent because they weren’t my target audience. I would have been happy to discuss with them, but if I wanted to post on LW I would have.

I just wanted them to stop hunting me down to have long exhausting arguments where they tried to convince me of basic things I already believed like “CIRL is not a solution to alignment.”

It was a collective failure to understand basic aspects of the idea.

To be clear, I might have been interested in posting there. But once I understood the community norms it was clear that if I posted there I would get a response analogous to this:

Image

So I didn’t, and focused on my work. And practiced teaching LW people about the idea. I was pretty patient, IMO.

In person people often eventually came around, but it was very hard getting them out of an adversarial frame.

In comparison, I would have no trouble explaining the idea to other academics or other researchers with non-trivial training in AI. I could also explain the idea to non-technical people well.

LW folks had a noticeably harder time understanding the idea.

I think it’s because the community’s collective understanding of the idea was wrong and the community members practice a norm of deference to “the Wise.”

In summary, your response assumes I’m mad my work didn’t get engaged with.

In practice, I’m disappointed the quality of the engagement was so far below what I experience in my research community.

At the same time, the broader ML/​AI community was also engaging in a shallow way because my paper dared to cite Bostrom and take his, Eliezer, and Omohundro’s ideas seriously.

But they could, at least, understand the work from reading the paper.

Damon Sasi:

I think it’s fair enough to be disappointed and frustrated if people don’t engage at a quality level you’d prefer.

I think my read of the sitaution, just from the above discussions, is that it was engaged with pretty thoroughly, just not in the way you’d have preferred.

Specifically it seems like people who engaged with it engaged with it mostly through the lens of your co-author, who apparently did not present it in the same ways you did.

This seems like it cuased some problems in how people who encountered your work perceived it.

You can frame that as a quality issue, and maybe you’re right to. I don’t know; I haven’t read the object level content/​interactions.

But various points in the above discussion point to frustration at people for not engaging with *your* take, or misunderstanding *your* ideas.

If I’m misreading you in implying stupid/​lazy/​jealous/​etc, I apologize. I appreciate you not using those words directly if not intended, in any case.

But your explanations continue to seem derogatory, rather than accepting that maybe there’s a perfectly reasonable explanation.

Namely, that your co-author’s frame of the ideas were the one that most people most engaged with, and that your version of the ideas were misunderstood because unless they spoke with you directly, they had less opportunity to even learn what they were.

If you disagree with that explanation, and think “deference” is a better one, all I can say is it seems notable to me that you say people who talk to you directly come around, and you never posted on Lesswrong to get the experimental data of what would happen if you tried.

To be clear, you don’t have any sort of “obligation” or anything to post on LW. I’m just pointing out that, from my perspective, your points about LW being “unwelcoming” or “adversarial” seem to have better explanations than the ones you gave.

DHM:

Well, I guess there is an implied point that I think there’s a better way to engage with work. I’ve made separate comments about the overall quality of some of the work on LW, which is very hit or miss in ways that are hard to tell without reading super carefully.

My underlying claim is that the intellectual environment and overall norms are a huge barrier to people who have had academic training actually participating. I admit that the argument with Oliver ended up getting pretty confusing and disconnected from that point.

And one of the barriers is things like, e.g., evaluating a paper about an idea based on the claims made in the paper. Two things are true:

1) I think this is a good norm and LW would be better if it was adopted more broadly;

2) I think not adhering to that norm keeps people with professional research training out of the community. If nothing else, it’s a big barrier.

I can also be clear that I wasn’t seeking out engagement with LW folks as much as they were coming to find me and talk to me. They would, e.g., come to my lab at CHAI and engage me in debates to convince me that CIRL wasn’t a solution.

This is a thing that I never believed and have never written anywhere. But somehow the community decided that’s what the work meant and it was important that I learn to truly understand the “Hard Problem of Fully Updated Deference” or something similar.

Which, to be clear, I do understand.

It’s the problem of getting a rational agent with essentially no predictive uncertainty of the world to defer to (human) oversight.

I extended a toy model they introduced to study the problem. Here’s the MIRI paper. https://​​intelligence.org/​​files/​​Corrigibility.pdf

My paper identified conditions under which goal uncertainty would cause the agent to prefer oversight. It was interpreted to mean that I am saying preference uncertainty solves alignment. https://​​arxiv.org/​​abs/​​1611.08219

Here’s the strongest claim we make: “We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.”

That simply does not state that this method solves alignment. And yet I had person after person come try to argue with me about that.

When I tried to explain my work, they often didn’t or couldn’t understand.

FWIW, I have no trouble explaining this to colleagues or students.

Was this the biggest deal in the world? No. I dealt with it just fine and the people who did engage were typically wonderful.

Is it an example of a broader pattern where people like me don’t feel welcome? Yes, absolutely.

If people had spent their time arguing with Stuart, that would be fine. Stuart is a busy person and I was a lowly graduate student. So they usually directed their frustrations with me.

This was made worse by basic issues with research literacy across the board.

There’s a skill to reading papers and understanding what they mean. It’s not too hard, but it’s not the same as reading a popular book.

Which, if you only read that, that’s fine.

But if you want to go argue with the paper’s author you should base your arguments on what the paper says. And when the author says something like, “the paper doesn’t say that, you haven’t understood it,” you roughly believe them.

These are basic skills. They are easy to learn. I’m sure LW could benefit from trying to find value in academic norms rather than rejecting them wholesale.

But, putting that aside, it is pretty clear that this type of behavior drives people like me away.

PS I also think the premise of your comments is a little frustrating, because you assume I am unfamiliar with the topic and posted this lightly.

You’re welcome to think I’m mistaken. But you should expect that the barrier to convincing me to drop a long-held belief is high.

Damon Sasi:

My point is that the “somehow” in “somehow the community decided” is, again, pretty straightforwardly that they read Stuart to be saying that’s what it meant.

Do I know that to be the case with everyone who you talked to? No. But it seems the parsimonious explanation.

I’m not assuming you’re unfamiliar with the topic, I’m just pointing out something that seems to be a continuous missing element in your explanations of what happened. Namely, the effect of Stuart’s communiation on what people believed about the paper.

But either way, I really haven’t been trying to convince you to drop your belief. I was just pointing out why *I* haven’t found the explanations you provided convincing. You don’t have reason to care about that, but in case it’s helpful to know why, I hope I made it clear.

DHM:

I’m well aware of the dynamic that was at play. E.g., I wrote this yesterday.

I do understand why they interpreted it this way. Does that change your perspective? It should if you’re correct about your reasons imo.

> To be clear: the community failed to understand basic properties of how a mathematical framework is supposed to be used.
> I’d be disappointed if an advanced undergraduate made this mistake.
> If a PhD student did, I’d be concerned they don’t understand important fundamentals.
> But, because my co-author was saying stuff his tribe didn’t like, Oliver argued that it’s good that his community never understood the idea and never genuinely engage with it. Apparently, this is just normal and comparable to academic fields.
> (To be clear, it’s not.)

But it’s also not relevant to my argument.

Arguing obstinately with someone about their work based on something other than their work is really frustrating and is a good way to keep professionally trained researchers out of your community.

As, I should add, is this behavior. Where a mob of people demand unreasonable evidence any time they feel like the LW community is remotely threatened. It’s like an immune response.

Like, if this was just me, you might have a point.

But I started with a story of someone else who had a similar experience. At least two other people have chimed in on this thread with similar experiences.

This is clearly not just me. I’m just the person who felt like it was worth discussing.

Also, it’s your community. If you don’t care whether people like me want to participate, then you can feel free to disregard my critique — you shouldn’t care.

’ll probably stick around in one way or another either way. But it affects, e.g., how much I cite LW or how I recommend that my students engage with it, if I recommend they engage with it at all.

I have no desire to subject students to something like this.

Damon Sasi:

I’ll be honest man, I see the way you continue to mischaracterize Oliver and I don’t really feel like there’s a huge loss if you choose not to participate.

Like, even secondary to “the community just didn’t like what was said,” which really is not the dynamic I’m pointing at.

Believe it or not, I care about good criticisms of the community. I get your skepticism, but if you insist that all you’re doing here is pointing to a valid criticism, while continually failing to understand my response… well, it’s at least a little ironic, you know?

DHM:

Then I’m confused. What are you trying to say. Or maybe you could explain Oliver’s point.

I said the community has barriers to people with different intellectual commitments participating and described what I perceived as a barrier with examples.

You seem to be saying they either aren’t barriers, or that it’s good those barriers are in place. Did I miss something?

Like, I’m aware of the “somehow” you reference here. Why does it change my point?

> Damon Sasi: My point is that the “somehow” in “somehow the community decided” is, again, pretty straightforwardly that they read Stuart to be saying that’s what it meant.
> Damon Sasi: Do I know that to be the case with everyone who you talked to? No. But it seems the parsimonious explanation.

I continue to be confused why this is a mischracterization. What did Oliver mean? How does it differ from what I said?

> DHM: But, because my co-author was saying stuff his tribe didn’t like, Oliver argued that it’s good that his community never understood the idea and never genuinely engage with it. Apparently, this is just normal and comparable to academic fields.

Damon Sasi:

Ok, let me try again.

In marriage therapy, something that happens *pretty often* is people miscommunicate. It’s actually quite hard (see this conversation!) for two people, even when without conflict, to be able to paraphrase each other’s points in a way each would agree with.

So when I see two people with a different experience of an event, or a different account of it entirely, my starting position is not “who is right and who is wrong,” it’s “how did this divergence occur?” Sometimes that can’t be known for sure, but there are some common trends.

Your starting position seemed, to me, to be “LW didn’t engage genuinely with our ideas, they consistently failed to take the paper on its own merits, and I didn’t feel welcome there.”

These are pretty unfortunate things, if true! How can I tell if they’re true or not, as someone who was not there and didn’t experience what you or the others did?

Well, I can check how much people engaged? “Genuinely” is a tough thing to judge, and I get why you might discount 20+ hours of work if you think it totally misses the point, but I think it clears the good-faith-attempt for engagement, at least, even if the person is wrong.

I can then check how well they understood the paper, and as you say, it seems pretty clear that many people misinterpreted it. But it also seems like many were not, in fact, addressing the paper directly; they were addressing Stuart’s explanation of the ideas in it.

Again, you can blame them for this if you want to. I get why it would be frustrating, if it caused you to have so many people come argue with you about it instead of him. But as a simple quesiton of “why did this happen,” your position has not addressed what Stuart said.

You’ve admitted that yes, they were probably going off of what Stuart said, but you haven’t demonstrated why they were *wrong* if they did that.

Would sticking to just what the paper said have helped? Sure, probably. It certainly sounds like it would have made your life easier.

But that does not absolve Stuart from saying a different set of things, if he did, and it does not put 100% of the blame on them for responding to what Stuart said and taking it as an accurate summary of what was in the paper. Why assume Stuart would misrepresent his own work?

Now, maybe he didn’t! Maybe all the people who engaged based on what Stuart said were still just massively wrong in the way they interpreted it all.

But that hasn’t been demonstrated. All that’s you’ve siad is that the original paper itself didn’t say the things they reacte to.

It seems materially important, in any case, that the takeaway for a lot of people was “Stuart said X and I think X is wrong and here’s a bunch of writing about why” and your position, which is that they just… didn’t like that he said things they found heretical.

You see why that’s derogatory, yes? You may not have said “stupid/​lazy/​jealous” (though I could swear at some point you implied at least some of it), but you are in fact making a pretty derogatory judgement if you assert that they blindly knee-jerked rather than engaged honestly.

Like, why can’t it be the case that they both engaged honestly *and* they ended up wrong about what the paper claimed because they were taking Stuart’s words as representative? Communication is hard! Is your claim in fact that Stuart would never, and did never, misrepresent it?

If your answer is “in any case, they should have just gone to the source and read the paper directly, as all TRUE ACADEMICS would”...

Sure, maybe! That seems like a decent standard to hold forth.

But then your criticisms of them just knee-jerk dismissing things seems false. Your criticism of them not engaging genuinely also seems false. What’s left is that they failed to adhere to a rigorous academic standard by instead arguing with what a person said about their work...

...and I hope it’s clear why not everyone would find it the heavy-hitting, deep cut criticism you’ve been presenting it as? It doesn’t make them unwilling to engage with stuff. It doesn’t make them close minded. It arguably doesn’t even make them lazy! LW is not in fact academia.

Maybe it WOULD be great to have more academic norms there! Seems like constructive criticism to provide.

But meanwhile, there was another point you made about it being unwelcoming. And the fact is you just… never posted there?

So it *felt* like it would be unwelcoming to you, but they were not *actually* unwelcoming to you. You see why this criticism also doesn’t land?

Yes, yes, I know you have that Eliezer quote ready to whip out. Here’s why it’s not as effective as you seem to think it should be:[11]

*People on LW do not defer to Eliezer.*

They often *respect* him. They often *listen* to him.

They do not blindly accept what he says.

You can be skeptical if you want. It’s just how it is there. You can see plenty of evidence of him being disagreed with if you look around.

So yes, you MIGHT have gotten a post like that from Eliezer if you’d posted there. And if so, the expectation from many people would be that you respond to it if you want to, or ignore it if you don’t. That’s… how forums work? By and large?

It’s how Jan treats them (the guy who originally shared that comment). He’s still posting on LW, you’ll notice, despite having had that comment in response to his post.

Because again, Eliezer doesn’t run LW. He’s not blindly deferred to. He’s just a respected contributor.

Maybe that’s enough for you to find it unwelcoming. That’s your right. You don’t have to engage there if you don’t want to.

But “this is what makes it unwelcoming for people like me” just doesn’t feel like the deep cut if all of the above is the sum of your reasoning why.

Finally, to address your point about Oliver’s perspective...

I mean, I think I’d rather let him speak for himself. But the basic gist I got was “Yeah, lots of people did engage. It was pretty exciting for a while but didn’t pan out as worth more time investment.”

This is pretty damn different from “It’s good we never understood it and never deeply engaged with it.” I mean come on man, that’s the strawiest strawman you could possibly construct of what he said. Cartoon villains don’t even talk like that.

You’re totally allowed to think he’s wrong to think the idea wasn’t worth more engagement!

But to insist that it *wasn’t* engaged or that he thinks it’s better off misunderstood is a pretty big failure to take him or anything he says with even a smidgen of good-faith.

You talk a lot about good academic standards, and welcoming discourse norms, and so on. I have to say, I don’t see it on display in this thread.

Communication is hard. Even smart people miss each other sometimes. The blame game is always tempting, but rarely useful.

You could lob critiques at LW and point out its failures in constructive ways. People are generally open to this.

But it’s not an “immune response” when they ask you for high standards of evidence for those failures. That’s just… well, good academic rigor, in my view.

And in my experience, even when people have a genuine grievance, their stories for why it happened and how it happened are OFTEN wrong. Ask any mediator or group therapist for stories, and you’ll get a million of them.

*shrugs* We all do the best we can, in any case.

Anyway, that’s my take on it. It’s nearly 1:30AM here so I’m going to bed now. Thanksk for asking for clarification again, and I hope this was a helpful elucidation of where I’m coming from and why, regardless of whether it shifts your views on anything.

DHM:

I appreciate your effort.

I do, in fact understand your point. Communication *is* hard. Miscommunication is common. But, e.g., your thread has some misconceptions.

Let me give you an example from Ryan Carey’s workshop paper in 2018.

https://​​aies-conference.com/​​2018/​​contents/​​papers/​​main/​​AIES_2018_paper_84.pdf

Image

It references a 2017 paper “should robots be disobedient?” That I wrote with Smitha Milli. Here’s what they say in the introduction.

Image
Image

They say “we relax the assumption that the AI systems knows the reward up until parameter theta. Instead, in order to allow for model-mis-specification, ….”

Based on this, I would assume that the 2017 paper does not study or consider model misspecification.

However, we very obviously *do* consider this.

“We then analyze how performance degrades when the robot has a misspecified model of the features that the human cares about or the level of rationality of the human.”

Image

This is one of the examples that Oliver used above.

I understand Oliver to have said that it’s ok that work like this makes basic mistakes like this because you can interpret the paper as responding to Stuart’s public comments.

And that’s not a good norm.

I’m not sure how to describe such a basic error. I’m not sure how else to say that it clearly is responding to the paper I wrote and not Stuart’s public comments.

Calling this a simple misunderstanding is quite charitable.

Ninth Thread (on DHM’s frustrating experiences with LWers at conferences)

DHM:

A few comments. 1) I’m not Stuart; 2) your comment indicates that you don’t understand the value of the idea, I’m sorry; 3) I don’t think you understood Stuart’s communicative goals with the public.

I’d be more sympathetic if the response was something like “well, yes, goal uncertainty is important but clearly not enough” and then LW did lots of work on preference uncertainty and how to ensure it is well accounted for in AI system behavior.

Leon Lang:

My attitude is that if you propose a theory that claims to be progress to a fundamental problem, then it is *on you* to demonstrate that progress, as the person most convinced in the world that it is promising.

DHM:

This is uncontroversial. Why do you think I disagree?

As far as demonstrating the value of the idea, I look forward to your in-depth comments on my thesis. I believe it provides good evidence that preference uncertainty is important for alignment.

https://​​www2.eecs.berkeley.edu/​​Pubs/​​TechRpts/​​2021/​​EECS-2021-207.html

Image

My current understanding:

Eliezer was very worried our work would be interpreted as saying preference uncertainty solves alignment (which we never claimed). As a result, the community engagement with the idea was to try to convince me that it wasn’t a solution.

In practice, I never said, nor did I believe, that it was a solution.

But, for 10 years, people from LW would seek me out to try to convince me of this. This happened at my lab, at workshops, at conferences — it was everywhere.

Looking at it now, I recognize that they completely failed to recognize that Stuart and I are different people and I was basically an avatar for Stuart in their minds.

I was more accessible, so I dealt with a *lot* of this.

Imagine this: you work on an idea: “X”.

And for ten years, a bunch of people come up to you and say “you know, X is wrong” based on the same misinterpretation “Y”.

And they seemed to put a lot of effort into saying “X is wrong because Y.” People came to my office to give talks about their misunderstandings, I participated in these conversations in a friendly and kind way.

No one would believe me that Y wasn’t a part of the idea.

Or, at least, very few people would.

I now understand that they didn’t actually believe “Y”, instead they believed Eliezer believed it and I never had a chance to convince them.

Thread from the day before on Academia vs LessWrong, and Academia ≠ Science

This thread from the day before is inserted here for context on the next thread.

DHM:

I really can’t let this comparison between Lesswrong and academic science go.

One is an insular online community with a few scientific contributions.

The other is the engine of progress that produced the modern world.

These are not the same, obviously.

OH:

No, science produced the modern world. Academia has always been a small fraction of science, and LessWrong is pretty continuous with humanity’s broader scientific efforts.

I think modern academia is just not a great engine of progress.

Industrial research labs, researchers both at universities and outside of it doing their own thing, companies solving problems and then sharing their knowledge, are much more responsible for intellectual progress.

Post 1970s academia is not humanity’s engine of progress and IMO largely has stolen the valor of the institutions that are actually responsible.

You can see this yourself in AI! Academia has played little role in the last few years of progress. This turns out to not be very rare in other fields.

This doesn’t make LessWrong better, I think it has many flaws and is a small fish all things considered, but the correct norms and institutions for intellectual progress are much less overdetermined than you seem to be arguing for.

DHM:

This is a very fair response. I admit that I overstated initially.

What I will say is that those labs and much of modern science outside academia are still connected to academia and downstream of it in a very real way.

For example, those labs often recruit by intentionally styling themselves after academic labs. When I was in grad school, the common refrain was: “it’s just like academia, except you don’t have to write grants!”

Similarly, academic norms and institutions are heavily involved, e.g., AI. Sure, the work is done at companies, but it often originates in academic settings and is published in academic forums.

The claim I will stand behind more strongly is this:

Professional research training and academic norms around research processes are very valuable overall and quite undervalued by the LW community.

kzSlider:

What are the norms you have in mind? If they conducive to progress I’m sure people would be pretty interested in adopting them :)

DHM:

Peer-reviewed conferences and an explicitly maintained canon of work that the community endorses as being representative are important.

Another one is using related work and having standards about related work to connect the literature and build a more solid foundation.

Another one is valuing incremental work. This is often looked down upon as being a waste of time.

In practice, careful, incremental work is a critical part of the process. If it’s the only thing you do, that’s a problem. But it’s also a problem if it not enough happens.

E.g., LW would produce much higher quality work if there were more incentives for somewhat boring followups.

You can fault academia for a lot about citation and publication counts, but it does create incremental work that meets a minimum bar of quality.

I often feel like Lesswrong has far too many grandiose theories and not enough boring work to explore those theories to the point they can be investigated.

In general, I’m just entirely unsure what the standard for an LW publication is to be good.I’m not saying peer review is transparent to outsiders, but everyone knows its a thing and it means something. I think it’s a big problem in academia that the reviewing process has gotten so bad we’re losing that as a check in the system.

What do you think?

kzSlider:

Thanks for this comment, this was really helpful! Agree with most of this

Peer-reviewed conferences (given that much of the AI work goes to regular ML conferences/​workshops) feels pretty expensive to put on and fly people in for though

How do you feel about the Alignment Forum (an invite-only community of researchers) and the Best of LessWrong lists which compile ~50 of the most important posts of the year

These seem to fit the “maintained canon of representative and important work”?[12]

As I’m sure you know many fields look down on CS for having conference rather than journal culture so it seems possibly okay to have something else similar which solves the problems you identify but not looking exactly like the typical conferences (and possibly online)

Having said that I do think conferences are generally a good idea and some of the safety fields have them now (MechInterp at ICML, SLT has regular workshops, ControlConf coming soon etc)

“Related work”—I absolutely agree with and I think the community should be better about that

I’m a little surprised to see your comment about incremental work as I feel like I see a lot of very incremental work on LW (lots of posts doing very minor changes to SAEs etc).

Perhaps you feel like there are some particular posts which should have had follow-ups but never did (e.g. Waluigi)?

Do you have other posts in mind that deserve followups that you could list here? Knowing that someone as respected as you would be interested in a follow-up might encourage people to take up that project :)

I agree that LW has many grandiose theories, I think this is a result of the community forming when models weren’t really good enough for most empirical work and attracting more conceptually minded people.

I’m not sure how much of a problem this is (philosophy and literature also have lots of theories which don’t seem well grounded, debatable how well these fields are doing though)

I think people would probably be interested in a post which lays out the argument for more empirical and less conceptual work (though I feel like much of the community has recently moved in this direction anyways)

I also feel your pain about the decline of peer review in academia. What do you think are the key parts of it worth saving?

E.g. LW has the ability for comments which can make work better and above I mentioned about the “Best of year” lists which function somewhat like a journal of the top material.

Is it the explicit scoring of peer review that you find helpful? Or the fact that it encourages more structure in papers? Or something else?

DHM:

There’s a lot here and I mostly agree with it.

I think the most valuable part of peer review is it forces the community to intentionally manage a collective body of work that is moderately coherent, standardized, and meets minimum evidentiary standards for ideas.

The process of reviewing also creates common knowledge about what makes work good or not and what the bar for publication at a recognized venue is.

E.g., I have no idea what the associated bar for LW is or how it is managed. I worry it’s “will Eliezer like it.”

Second Major Thread (on “the institutions of knowledge creation”)

In which LessWrong is criticized further, and then there’s a lot of spawning threads on the phrase “the institutions of knowledge creation”, whether that’s appropriate to describe academia, and how LessWrong and academia compare as such institutions.

This starts in response to this part of DHM’s opening comment:

Since I’m a glutton for punishment, here’s another long-held critique I have for Lesswronger alignment folks:

Your community seems intentionally designed to prevent people with different intellectual commitments and approaches from participating.

Esben Kran:

Unfortunately agree. I also heard the stories of mathematicians trying to engage with infrabayesianism but being completely dumbfounded by the lack of rigor. And the people engaging with LW framings of control theory and not receiving goodfaith responses.

Caleb Parikh:

What are the less wrong framings of control theory?

OH:

(My guess is he just means Buck’s and Ryan’s framings of control theory)[13]

EK:

Nope, they’re great (though with the same problems as ML academia). I’m talking about actual control theory, adjusting dynamical engineered systems to reach set points. Alignment is the lw framing (should’ve just said that tbf).

DHM:

Can you say more about this? I could interpret what you’re saying a couple ways.

EK:

Basically, that serious engagement with the core arguments have previously been /​ are met with a “you clearly don’t get it” despite obvious flaws in the reasoning chain. Afair, the original story I heard was a discussion on lw about properties of the evolutionary algorithm.

DHM:

LW discussions of evolution are so bad. It’s misleading intuition that was never followed up on but steered a lot of people’s beliefs and research directions.

As it turns out, building alignment research on top of a high-school-level understanding of evolution is terrible.

EK:

Agreed. I’m thankful I entered rationality from Bayesian statistics and computational modelling separately from lw and AI risk from hawking.

With that said, lw is pretty damn cool.

DHM:

I’m also quite grateful that Stuart Russell was my pathway to the ideas and that I was already quite mature as a researcher when I started interacting with LW.

In another universe, I’m probably one of the most active LW people. I see the value, hence the critiques.

EK:

Word. When I decided to properly enter AI risk, it was definitely interesting to just read everything on LW and adjacent compared to entering an academic field. The signal to noise ratio was quite off but it was more personal and aware, for sure. Though I was always confused why no one wanted to publish in academia as well as LW.

DHM:

Well, that would have required admitting that there’s value in professional, institutionalized science. Which LW folks, Yudkowsky in particular, are constitutionally incapable of.

EK:

From my perspective, if you go outside the institutions of knowledge creation, the burden of proof is on your end. And I think most had the opposite perspective just a year or two ago.

DHM:

We agree there on both counts.

OH:

What kind of Orwellian language is calling academia “The institutions of knowledge production”.

Academia does not have a monopoly on knowledge production! It is not the case that academic norms should be assumed the default for knowledge work, and luckily they are not.

We just had a thread about this where you admitted you overstated your claim, and now you are here making an even stronger claim: https://​​x.com/​​ohabryka/​​status/​​1895531462642057371

Academia faces the same burden of proof about it being a good place for knowledge production as anywhere else, and it obviously frequently fails horrendously.

There are parts of scientific thought and perspective that do truly deserve to be taken as the default, but not most academic norms, which aren’t even consistent between fields or countries.

First Subthread (on how welcoming academia is vs LessWrong)

EK:

For context, I think of “academia” as the established forums for university output, such as NeurIPS, Nature, etc., not restricted to universities.

I think it’s unfortunate that AI safety ended up not being appealing to the 10k+ participants at NeurIPS before ~2023.

DHM:

I’ll add that LW actively contributed to that state of affairs. Lots of ways this happened, but a key one is that LW did not welcome that broader set of experienced eyes when, as my initial story indicates, they tried to participate.

OH:

Look, LW is obviously vastly vastly more welcoming to external contributions than almost any other research platform.

Like, yes, I think it could do better, but compared to the experience of trying to publish in academia as an outsider, it is absolute paradise.

The experience of trying to publish on AI Existential Risk at all in academia, especially before 2022 was absolute hell.

Yes, LW also has failed at various points, but your framing of “LW was particularly hostile here” keeps making me feel like something crazy is going on.

Structurally, it’s an open research forum! You can just show up and participate. All the comments are public so you can understand the culture and standards on your own terms.

This is approximately impossible for any academic field. Good luck publishing at all without spending multiple months learning the exact complicated norms and going to multiple conferences and probably you just don’t actually have any shot if you aren’t collaborating with someone who has spent years of their life getting into the field.

Yes, when you are on the inside, it starts being easier to move around, a bit. But mostly the experience I know that people with PhDs in one field have publishing in other fields is vastly vastly worse than publishing on LessWrong.

Your default outcome is to publish a paper to a journal, not hear anything for 6+ months and then get rejected privately with a terribly confused rejection reason. I honestly can’t really think of a much worse entry experience that has plausible deniability of maybe having a purpose if I tried to design one.

EK:

I agree that e.g. neuroscience is absolutely horrible at this, but ML academia is pretty forgiving: Feedback within a month, natural promotion based on the work’s quality, in-depth reviews, and so on. It’s contrived to say that this is the default outcome of academic research.

I think publishing on X-risk has been hard because the papers that have been written about it aren’t sent to the right venues, just haven’t been high quality enough, or haven’t respected the fields they enter (e.g. introducing LW notation for fields that has existed for 60 years without understanding what’s there already).[14]

Arun Jose:

I think technical alignment discussions began on LW, and I find far more commonly people in academia not respecting this field they enter (e.g. not doing even the most basic forms of lit review).

DHM:

I did a *lot* of lit review.

I’m confident I understood it well in 2014-2016. I stopped watching closely after that.

I was pretty respectful — my conversations were in person.

I often collaborated with and supervised people in the community. I basically met with anyone who asked.

That wasn’t sufficient, for reasons that had nothing to do with the quality of my ideas.

Arun Jose:

I wasn’t thinking of you fwiw, I was thinking of papers written by people who started working on alignment (or adjacent) topics in the last couple years.

DHM:

Np, sorry I misinterpreted.

Lots of people jumping on the bandwagon without doing their homework these days. No argument there.

Second Subthread (comparing PhD programs and MATS)

DHM:

Institutions of knowledge production is a literal, descriptive phrase. Universities are institutions. They are centered around knowledge production.

At no point did I say they are the only ones.

I do think LW, as something new and disruptive, has the burden of proof.

OH:

I mean, sure, by that definition companies are also institutions of knowledge production, but the diversity of companies is so large that “institution” is a misleading claim”.

Also, the internet is now 20+ years old, public forums have long established themselves as a valid place to produce knowledge. Most academic institutions (e.g. anonymous peer review) are only ~50-60 year old, so we really aren’t talking about a huge gap in age.

DHM:

I don’t have time to do the work, but if you wanted to survey the research output of public fora and compare them with output from academic labs, I’d be curious to see what you find.

Doesn’t seem like it would be a comparison to me. Similarly, it does seem like the burden of proof is on you to support that claim. It will be trivial for me to point to large number of patents, research ideas, and companies that came out of academic labs.

And I do think, e.g., Bell Labs was an institution of knowledge generation in a way that LW isn’t. It is also connected to academia through the training pipeline in a way that LW fundamentally isn’t.

And, as a reminder, I think LW has promise and has made interesting contributions. I’d like it to mature as a research community.

At this point, I think that means recognizing the value of profession research practices that you learn in a PhD.

OH:

To be clear, my current take on PhDs is that they are often great if you can get a great advisor, and terrible if you can’t. This is partially because I do not give academic institutions a different burden of proof than other methods of knowledge production.

Apprenticeship systems are great. When academia does the things here that make sense, then I do think it can be a huge boon to someone’s understanding of the world.

The vast majority of PhDs do not have this property, and the vast majority of my friends who have started or completed PhDs regret it, especially if you bracket the career capital component and focus on the “learn how to be a good researcher” component.

The worst mistakes I’ve seen people make with academia is to trust the academic system to do things by default correctly. You actively have to fight for having a good experience during your PhD. If you do, it can be great, but most of the time, for most people, it won’t be.

DHM:

We basically agree here. You also forgot the demonstrated negative effects on mental health.

But the question isn’t “what fraction of people regret a PhD.” It’s how useful are the research skills that a PhD provides in doing good work. My claim is they are very useful.[15]

DHM:

I would be delighted if LW figured out a way to bring people into those skills without the associated, tangible, costs and risks of a PhD.

That has to start with valuing those skills though…

OH:

My current guess is that the MATS pipeline + joining a research team lead by one of the MATS mentors, is both valuing the skills that a PhD provides, and vastly outperforms a PhD as a way to get started with research work.

I don’t think it’s perfect and I think it has many issues, but I do just think it’s already a lot better.

(1)

DHM:

We agree that the MATS program is good. I’m hopeful for it and value it.

I think it’s comparable to an M.Eng program or an MS. It doesn’t and won’t produce the same level of research care and skills you get from a full PhD.

If you did MATS for 3-5 years with the same person, maybe it would.

But, IMO, the burden of proof would be on you there.

At the moment, I think MATS is a great way to prep to get the most out of a PhD. Indeed, that’s how it functions in my experience.

OH:

I think people learn vastly vastly below maximum efficiency during a PhD. I think a single MATS round is definitely too short, but I do think the skills people learn in a PhD can be picked up in more like 1 year rather than 4 years.

And I think the right place to pick up those skills is ideally to be at a place that produces great work already and learn from your peers. And in the absence of that, to just publish work (IMO best on LessWrong or in other places with faster feedback loops) that they think might have a shot being great and engage in public discussion about it.

EK:

Agreed. Ideally, I think you should go for 100 ideas into 10 LW posts into 2 workshop papers into 1 conference paper or something like that as a ratio.

My impression is also that West Coast computer science degrees aren’t too scientific (i.e. trained in methodology etc. though still loads of knowledge generation) which could affect people’s impression.

In Denmark, the CS teaching I saw at Santa Cruz would be part of an SWE trade degree, not an undergraduate.

(2)

OH:

I agree it alone of course cannot replace all the things you learn during a 4+ year additional PhD program, but many people do go on directly to join research teams, and those research teams then provide better mentorship and guidance than PhDs would.

Many people are already good enough to do good work without a PhD. Yes, they will have gaps, but they are gaps that can be solved as they arise, not ones you should try to somehow pre-empt by spending 4 years in a depressing and often feedback-less environment.

DHM:

But that mentorship is usually from someone with a PhD, no?

Look, I’m not arguing LW is bad. I’m saying it systematically devalues the established skillset of professional research. And there is a difference in quality that is hard to see with the untrained eye.

You now seem to be arguing that a PhD isn’t worth the costs. For the vast majority of people it’s not — this is uncontroversial.

My claims: 1) there are genuine research skills that matter; 2) the standard way to get them is a PhD; and 3) LW generally doesn’t value those skills.

OH:

I don’t think most of the mentors I am excited about have PhDs, and if they do, I don’t their PhD makes much of a difference.

Buck doesn’t have a PhD, Ryan doesn’t have a PhD, Chris Olah doesn’t have a PhD. I do think there is just a high background rate of PhDs, so many of the mentors do, but I at least don’t see any correlation with quality (and my guess is a mild anti-correlation, but my guess is it’s just noise).

I also don’t super buy there is much of anything consistent that PhDs teach besides “learn how to publish in academic journals”, which sometimes requires useful subskills, but most of the time doesn’t. The experiences of different people in PhDs varies so extremely widely that I don’t really buy there is a consistent “PhD skillset” that people learn during a PhD.

tutor vals:

Last part is the crux imo. That exceptional researchers don’t need PhDs is normal, but with respect to “how to get a productive ais community”, community norms have to empower more than the very exceptional.

I think this conv would do better pointing at specific skills/​mindsets that would be better to emphasize in the LW community than an indirect pointer “PhD skills”

The things I’m more pro for don’t particularly appear in academia, eg. caring about the impact of your work, discussing its theory of change publicly and getting feedback.

Third Subthread (on comparing industry and academia)

DHM:

Institutions of knowledge production is a literal, descriptive phrase. Universities are institutions. They are centered around knowledge production.

At no point did I say they are the only ones.

I do think LW, as something new and disruptive, has the burden of proof.

You’re over reading what is a pretty banal take: most research that gets done is done by professional researchers who were trained in a research methodology through a PhD program.

If you want to create a research community outside that, you should expect skepticism.

This is what I said in that thread. I consider it to be consistent with what I am saying here.

My walk back was primarily about “academia” as you, correctly, pointed out industrial research. I retreated to “people trained by academia.” I think that position holds up well.[16]

Can you give me an example of large research lab that is setting the state of the art in a non-trivial science whose research team isn’t heavily dominated by trained PhDs?

There’s a few people who can do good work without that training. They are incredible and rare.

I think LW systematically devalues that training and talks many bright, impressionable, promising students out of getting it.

I think the low quality of much of the work produced by the community is a consequence of that priority.

OH:

Sure, let’s take software engineering. My guess is the leading companies in terms of developing software engineering knowledge and tools are Google, Facebook, maybe Mozilla.

I don’t think any of their leading software teams are dominated by PhD holders. Rust wasn’t developed by a PhD, Guido van Rossum who made Python does not have a PhD.

Software engineering knowledge is real knowledge. It largely does not get developed by PhDs.

We can go through almost any industry here. Let’s look at construction. Want to bet whether the most successful and best construction techniques were developed by PhD-lead research labs?

Want to bet whether the biggest advances in food manufacturing were developed by PhD-lead research labs?

Rocketry is being built at SpaceX in rooms not dominated by PhDs either.

All of these industries have their own forums and conferences and they produce a huge amount of knowledge that is responsible for almost all of our wealth and prosperity, and academia tends to be a sideshow in most of them. To be clear, academia does provide value to many of these, but it clearly isn’t the central show.

DHM:

What’s your definition of knowledge production? Or science?

Mine is basically Thomas Kuhn’s from “The Structure of Scientific Revolutions.”

Many of your examples blur the line between product development, research, and science.

OH:

I was using it in a broad-ish way that includes any kind of knowledge accumulation (including trade-oriented, practice oriented, cultural).

The kind of thing that I think is largely responsible for the industrial revolution that you attributed to academia here: https://​​x.com/​​dhadfieldmenell/​​status/​​1895488960727756873

I think any other definition is very dangerous, since academic history often gets the causality of discovery wrong (with much discovery being made in practical circumstances, then cleaned up by a mixture of industry research and academic work for easier distribution and universal application), and so it’s very hard to disentangle knowledge production from product development and “research” as a category.

Day 2: Reflection on Previous Discussion

A new thread is started by DHM.

DHM:

> DHM: Since I’m a glutton for punishment, here’s another long-held critique I have for Lesswronger alignment folks:
> Your community seems intentionally designed to prevent people with different intellectual commitments and approaches from participating.
> Let me illustrate with a story.]

Again, this was interesting.

Many of the behaviors I was thinking of in this tweet surfaced in the replies.

I also got to see this gem of a comment that exemplifies what I mean by unwelcoming.

Image

I have thoughts about why this is considered acceptable behavior on LW, but that’s for another day.

Instead, let me describe one of the key patterns that I noticed from the examples that were raised in the discussion.

The central issue was a complete failure to engage with other people’s work.

People with research maturity understand what genuine engagement looks like. So it’s pretty obvious when it doesn’t happen.

At that point it’s just a question of pain tolerance and motivation.

Why can’t LW engage? I think it’s because at least one assumption is essentially a religious conviction that advanced AI must be fully rational and have full information, including full information about its goal.

This just straightforwardly implies alignment can’t be solved.

That’s why Yud’s P(doom) is so high.

Several of my papers imply this. The first is the off-switch game, which you can watch me present at MIRI in 2016: https://​​youtu.be/​​t06IciZknDg?si=RtojG5u587CIZaQi

That’s when I learned that the assumption of perfect rationality for advanced AI was non-negotiable.

The paper on this topic that I’m most proud of is “Consequences of Misaligned AI,” where we show that unconstrained optimization of even slightly incomplete objectives effectively ~minimizes utility under very general conditions.

Now, the correct next step IMO is to start examining those assumptions.

The LW school of alignment research categorically refuses to. I think that’s why their theories usually don’t hold up.

I do credit @ESYudkowsky for being consistent in taking the logic to its conclusions.

I’m also grateful to Lawrence for finding some receipts.

Re: community norms, I was glad to learn that LW explicitly thinks it’s a good idea to judge a mathematical framework based on something other than the paper that describes it. It helps to explain why explanations that usually worked didn’t land with LW folks.

[OH: I will judge your ideas based on all the context I have available to judge you! Of course I will not judge your ideas on the basis of books you didn’t write, but I will judge the ideas and arguments of your coauthors.

As does everyone else.

I will judge your ideas on the basis of tweet threads, on the basis of your books, on the basis of your reputation among peers, and on the basis of spoken conversations.

Those are all useful evidence what your ideas are! Ideas do not live in some magical magisteria of academic papers that no other medium is allowed to touch.]

Looking back, there really was no argument that would have worked. I spent a lot of time trying. It’s good to know that it was out of my control.

First Thread (w/​ Habryka & Critch, on treating writing as self-contained)

Andrew Critch:

There *are* LessWrong readers who can and will read a post as a self-contained series of statements that is or isn’t a logically or probabilistically valid argument, or informative report.

But as @ohabryka says, he doesn’t read that way.

But LW ≠ OH, even though he runs it.

OH:

I guess it is true that I don’t like to throw away lots of evidence when trying to figure out what a set of concepts mean, or how they ground out in reality.

When it comes to formalism, I also generally believe that it’s rare (though not impossible, especially in purer math contexts) for a proof or framework to fully stand on its own. The intuitions behind the proof tend to do more work than the proof itself, and engaging with the generating intuitions is very helpful in figuring out what the proof means, how to apply it, and where it might be wrong.

I would be surprised if almost anyone on LW doesn’t do the same thing. Who would intentionally ignore blogposts by the author of a paper about what a paper means when trying to understand a paper? I really don’t see anyone doing this. I feel like this is some kind of weird standard that I do think has some abstract appeal to it, but I don’t believe is something anyone actually adheres to.

And my guess is you know this both about me, and about others. I am not really sure what’s going on in your comment.

Like yes, local validity is of course a thing, I care a lot about it. But local validity doesn’t extend to only engaging with whole research directions by referring to the one academic paper written about it, when there are also hundreds of pages of secondary material available.

On its own, I don’t know what the point of CIRL is, or really any formal framework. I need some other contexts to understand why I should care about it. Something to motivate why it’s valuable and important. In this case secondary material provided that, and only in combination with that material did it give me and others an argument to engage with that hooked into things I cared about. I’ve found some of those arguments valuable (providing e.g. useful explanations of things like the problem of fully updated deference), and some of those arguments not that valuable.

This seems completely normal. I really don’t understand how I am supposed to engage with any kind of scholarship without it. I don’t think anyone else does either. Maybe I am truly missing some fundamental way of relating to the world, but I currently doubt it.

(1)

Andrew Critch:

Addressing your question on how you personally are “supposed” (your words) to engage in scholarship is a different task than my current goal of engaging Dylan on what’s bad for him about LessWrong.

But I’d be happy to spend 1:1 time on your question sometime if it’s genuine.

OH:

Makes sense. I interpreted your comment as implying a should (largely by gricean implicature of the lack of an objection to Dylan’s should, which is not a perfect inference, but feels reasonable even upon rereading).

If there is none, I am curious about your thoughts on it some time in a 1:1 context. It is plausible to me I am lacking some skills here. Many years of ongoing content moderation make you hunger deeply for any scrap of context you can use to evaluate things more quickly and easily, possibly in dangerous ways.

(2)

OH:

I guess it is true that I don’t like to throw away lots of evidence when trying to figure out what a set of concepts mean, or how they ground out in reality.

When it comes to formalism, I also generally believe that it’s rare (though not impossible, especially in purer math contexts) for a proof or framework to fully stand on its own. The intuitions behind the proof tend to do more work than the proof itself, and engaging with the generating intuitions is very helpful in figuring out what the proof means, how to apply it, and where it might be wrong.

I would be surprised if almost anyone on LW doesn’t do the same thing. Who would intentionally ignore blogposts by the author of a paper about what a paper means when trying to understand a paper? I really don’t see anyone doing this. I feel like this is some kind of weird standard that I do think has some abstract appeal to it, but I don’t believe is something anyone actually adheres to.

Andrew Critch:

Blind peer review does this. I dunno if LessWrong “should” do this (aside from already allowing anonymous accounts), because it’s hard for me to tell what LW’s goals are. But not doing this has some major costs for the community’s ability to evaluate arguments on their merits 🤷

OH:

Yeah, my sense is anonymous accounts provide an escape valve that helps in the most dire cases, though it’s not perfect.

I personally am not an advocate for blind peer review. My sense is it’s been harmful for science in the last decade or two in which it has been gaining traction, though I do agree it has some things going for it.

I think external context is just really extremely useful for understanding ideas. For most of science’s history review has not been blinded, and I don’t think we’ve seen it help very much since it has been adopted.

I agree that it could help in theory with making evaluation processes pay more proper attention to local validity, which I do think is helpful, but also extremely hard in almost any context but the most pure math or CS-y environments. Anywhere where you need to e.g. gather experimental data, you need to form a prior on how likely the experimental data was fudged or forged or selected for, and for that you really need the broader social context.

I do find it attractive in the most pure math domains, but my sense is also least necessary in those domains. If you really proved something important, almost nobody will care who you are, unless your proof is really very hard to understand and follow (in which case again the extended context, including social context, is largely what the academic community and really any other community uses to judge validity).

Second Thread (w/​ Jan Kulveit on different groups of thinkers on LessWrong)

Andrew Critch:

There *are* LessWrong readers who can and will read a post as a self-contained series of statements that is or isn’t a logically or probabilistically valid argument, or informative report.

But as @ohabryka says, he doesn’t read that way.

But LW ≠ OH, even though he runs it.

DHM:

I 100% believe this about individuals. I’m happy to engage with anyone who will seriously engage with my work in good faith. That includes folks from LW, and will continue to.

Q: What fraction of the community shares Oliver position? I honestly have no idea.

(1)

Andrew Critch:

Maybe 25%? Not 90%.

OH:

(I think it’s likely 95%+ because I am really not saying anything weird. I would be surprised if there is approximately anyone who would end up intentionally ignoring secondary material in trying to understand scholarly work.)

(2)

Jan Kulveit:

I think in your whole series of posts one problem is ‘who is even the LW alignment community/​ who is representing it’. There seem to be at least three different answers

(A) Simply, active users on LW: this is quite broad group of people which is not insular, has quite diverse beliefs, amazing research output, and often is unhappy and vocally critical about “LW”. On the other hand this group often endorses some of the epistemic norms, especially relative to twitter etc

(B) MIRI-esque core: this includes OH, former MIRI staff, various “old guard” users,… This group has way more coherent taste & beliefs. The site used to promote views of this group more heavily, but this is less the case (although there are new orthodoxies)

(C) Something I’d call “the LW mob”: people who often defer to the previous group in taste and identify with LW

I think your criticisms basically do not hold for A. (which makes some people defensive). They hold the most for C., but no one wants to defend this group as main source of value or representative. With B. I think there is a complicated dynamic where the people are smart and have good epistemic individually, but less so collectively. Making the “collective B.” update and learn often takes John Wentworth level of talent and written output. @ohabryka would typically represent B.

(I also suspect sometimes motte and bailey is going on where part of credit for work of A. is allocated to B. and B.-norms)

DHM:

This makes sense to me. Thank you for the context.

I’ve also found A to be pretty good. Several folks who seem to be A have engaged thoughtfully and respectfully. With appropriate pushback, and laudable curiosity/​openness.

That’s who I hope sets the tone going forward.

Andrew Critch:

What @jankulveit said.

Third Thread (w/​ Critch on LessWrong’s welcomingness)

Andrew Critch:

I also used to continually find myself combatting weird misinterpretations of CIRL to LessWrong fans and/​or MIRI fans. That’s partly why I decided to meet you (Dylan) and go to work at Berkeley back in ’17. That said, I will say that I think LessWrong is less intellectually unwelcoming now:

1) Habryka specifically — who you seem disappointed in here — doesn’t present as strong a presence as he used to in critiquing technical content (despite running the site), which I think is healthy for the community to not get too group-think-y around Habryka’s ideas of what matters. So, and not seeing eye-to-eye with him on this thread or in general doesn’t mean no one on the site will like your ideas.

2) Speaking for the future, I have some hope that LessWrong is more likely than many other venues to be an early adopter of formal argument verifiers, and actually respect the value of a formally verified argument, which will make it easier to publish logically valid content there and have it accepted by the community.

Also, something else I’ve found crucial to LessWrong posting: you *really* have to harden your skin to logical fallacies in the comments that no one will contest or flag and might even get many upvotes.

E.g., there is a predictable presence of readers who cannot or will not distinguish claims of logical necessity (A->B) or helpfulness (P(A|do(B))>P(B)) from claims of sufficiency (B->A). In particular, if you say “B might might help make a better future with AI” or “A good future with AI requires B”, you will ~100% reliably get replies of the form “B is definitely not sufficient to make a good future with AI!” or similar. And then usually no one else will show up and be like “Yo, Dylan said P(A|do(B))>P(A), not B->A; seems like you missed the point?”

So you basically have to start every post being like “ALERT: THIS IS NOT SUFFICIENT TO SAVE THE WORLD” and then you can say your thing with 75% fewer comments of the form “ARE YOU SERIOUS HOW COULD THIS POSSIBLY SAVE THE WORLD?” or similar. This can be annoying, *but*, honestly, with the world being on the brink of potential destruction (I claim), maybe it’s actually fine for people to have to repeat that alert at least once in a while. So I find this pretty forgivable, even if slightly sad for a community that’s supposed to be really into logic and the like.

Anyway, while I feel your pain and disappointment, at the same time I want to say that I think it has gotten considerably better over the past ~2 years since GPT-4 came out, and with the addition of the Agree/​Disagree button as a separate function from the Good/​Bad karma button. And I think there’s like a 25% chance at some point they’ll be like “Hey, why not use AI to check arguments are complete and valid?” before almost any other website or blog.

So, while I’m clearly irritated by certain failings, I don’t feel like it’s time to give up entirely on LessWrong as a place where productive collective reasoning can occur, and I for one have not.

DHM:

Thank you for this.

I want to be clear that my initial post is the opposite of me giving up on Lesswrong. I wouldn’t give the feedback if I didn’t think it could be helpful.

I think it’s good for LW to understand that people like me have often felt systematically excluded and that our ideas often get very little genuine engagement.

I hope the trend you reference continues.

Yep, I think @ohabryka actually cares, and maybe his communication style is off-putting for you, but I have learned over the years to basically never bet on Oliver failing. Personally I would encourage you @dhadfieldmenell to create an AI tool that engages with ideas (or helps users to engage with ideas) in the more fair and welcoming way you want, and then just show it to the LessWrong team as a demo-slash-feature-request. If you make a real effort to think of how to actually integrate your idea with http://​​LessWrong.com, I put like a 15% chance that the thing you make will become a feature somehow, an overlapping 35% chance that it will inspire the LW team to do something you like, a 35% chance that nothing comes of it at LessWrong but you make good use of it elsewhere, and (only) a 10%-15% chance that you end up regretting building that thing.

Basically this will help to more constructively demonstrate the communication style you (and probably I) would prefer, and probably also more quickly convey how it might be useful.

DHM:

I’ll help curate a dataset and provide guidance if there are people who would help do the actual work. I could commit 1h/​week for 4-6 weeks.

Day 3: DHM restates his evidence, Habryka argues he is repeatedly misrepresented & strawmanned

A new thread is started by DHM.

DHM:

> OH: Post the link! If it’s as you say it would be easy for people to form their own impression. Not sharing it IMO makes it likely there is some misrepresentation going on.

I initially thought Oliver’s claim here was that LW was actually welcoming and open.

After several examples, it’s clear what he meant was “actually, it’s good you felt unwelcome b/​c you deserved it.”

Which is nice to have in the open.

Just to review the evidence of the pattern:

1) My experience of an academic who reached out to me after having a strange experience trying to engage with the LW alignment community. Apparently, without specifics, we should assume he deserved the response he got.

2) My personal experience where it felt like LW intentionally misinterpreted a framework I proposed because they were worried about my co-author’s public comments. Oliver explicitly said he thinks this is a good norm for a research community.

I’ll note that I spent literal years where I had to talk LW folks regularly and ~every time they had exactly the same misconceptions about the work.

To be clear: the community failed to understand basic properties of how a mathematical framework is supposed to be used.

I’d be disappointed if an advanced undergraduate made this mistake.

If a PhD student did, I’d be concerned they don’t understand important fundamentals.

But, because my co-author was saying stuff his tribe didn’t like, Oliver argued that it’s good that his community never understood the idea and never genuinely engage with it. Apparently, this is just normal and comparable to academic fields.

(To be clear, it’s not.)

3) @jankulveit jumped in to share his experience posting an idea on LW. The LW reaction matched the two experiences described above.

The top comment starts with “Just to state the reigning orthodoxy among the Wise….” Oliver seems to think this is basically fine.[17]

Image

I knew this would be exhausting. And it was. This sarcastic comment captures how I felt at the end.

LW isn’t my community, but I’ve met lots of good people who engage a lot there. I hope the community matures beyond behavior like this.

Image

First Thread (w/​ Habryka pushing back on alleged misrepresentation, writes overall summary, DHM refuses to read Habryka’s summary)

OH:

> 1) My experience of an academic who reached out to me after having a strange experience trying to engage with the LW alignment community. Apparently, without specifics, we should assume he deserved the response he got.

No, that is not what I (or others) think. I do happen to think that given the suspicious lack of providing any details or links, my guess is something is missing and something kind of misrepresentative is going on.

Probably overall things didn’t go perfectly well! But also, I have learned from many years of moderating internet discussion that it’s extremely rare that a summary like this, from someone clearly aggrieved, captures both sides fairly.

My best guess is your colleague didn’t “deserve the response he got”. Probably it would have been better for him to receive a different kind of feedback, but also, I really don’t know, it’s hard to judge from just your summary, and the prior on misrepresentation is high.

> 2) My personal experience where it felt like LW intentionally misinterpreted a framework I proposed because they were worried about my co-author’s public comments. Oliver explicitly said he thinks this is a good norm for a research community.

Look, man, of course I did not say, and would never say “It was good for LW to intentionally misinterpret your work because of X”. That is an intense strawman and kind of obvious, and the kind of strawman that has compelled me in this discussion to repeatedly try to correct you. I do think you can do better than this.

What actually happened is that you never left a single comment on LW. People on LessWrong engaged with the work of your coauthor, who made different statements about the work you both wrote. As far as I can tell, outside of your CHAI colleagues, people weren’t really engaging with your perspective on your work, because it wasn’t really written up anywhere where those people would have read it.

This probably caused you a bunch of confusion when you tried to defend different interpretations of your work in-person, but importantly you never actually participated in any LessWrong comment thread. My guess is it would have gone well.

I do think it was good for people to engage with Stuart’s statements and work on CIRL. He was much more influential, and also I think has a stronger track record of research and so it makes sense to engage with his work more.

DHM:

You simply fail to understand that his work was the paper. That’s the thing to discuss if you want to talk ideas. You wanted to discuss PR. That’s not a conversation about ideas.

A topic this important deserves much better. I suspect you know this at some level.

I chose not to post because I perceived the community as unwelcoming.

You are bizarrely trying to argue that I’m wrong for having that perception while basically proving my point.

The reception was predictable, as all the examples I and others have shown demonstrate. These were not one-offs, it’s a clear pattern.

OH:

We are obviously not on the same page about what normal academic or intellectual norms are, and are unlikely to figure it out in this thread. I do think we can both avoid making misrepresentations of what the other person has said.[18]

DHM:

Please tell me how to interpret this exchange. In which, I said, you should judge the idea based on the representation of it in the paper and decide its merits based on that.

You’re more than welcome to say you think Human-Compatible was misleading. Idk, never read it.

> DHM: Which, to be clear, is that you judge my ideas based on the papers I write, not the general audience books I didn’t write.

> OH: I will judge your ideas based on all the context I have available to judge you! Of course I will not judge your ideas on the basis of books you didn’t write, but I will judge the ideas and arguments of your coauthors.

> As does everyone else.

> I will judge your ideas on the basis of tweet threads, on the basis of your books, on the basis of your reputation among peers, and on the basis of spoken conversations.

> Those are all useful evidence what your ideas are! Ideas do not live in some magical magisteria of academic papers that no other medium is allowed to touch.

But the mathematical framework I proposed and published with Stuart is not that.

It’s not serious to judge the underlying idea based on the way it’s summarized for a general audience. That’s a bad norm. That’s what I understand you to be arguing for.

And you seem to be missing the fact that this inability to understand the work was basically only with LW folks.

Sure, academics didn’t pay as much attention as I would have liked, but across the board they understood the idea.

It’s not hard to explain to people with the appropriate training.

I’m not even talking about PhD here. If you’ve taken an intro AI course that discusses sequential decision making, that’s basically enough to learn it.

OH:

> But the mathematical framework I proposed and published with Stuart is not that.

> It’s not serious to judge the underlying idea based on the way it’s summarized for a general audience. That’s a bad norm. That’s what I understand you to be arguing for.

It depends on what aspect you are engaging with.

If someone releases a mathematical paper about a self driving algorithm, and then argues in an accompanying blog post we should deploy it to real cars, of course the reaction should evaluate the original paper in the context of how reasonable it would be to deploy that algorithm to real cars.

This then involves engaging with both the paper and the blog post. This seems normal and common and of course I’ve seen you do it a lot with safety blog posts and other engagements from labs.

DHM:

Is this the blog post you are referring to?

The Need for Scientific Rigor in AI Safety Research

And no, you’re wrong.

You evaluate the paper that describes the algorithm based on the claims it makes and how well that is supported by theory and evidence.

If the paper includes limitations (as it should), then it should be clear that it wouldn’t work directly on cars.

If it’s not clear from the paper, that’s ground for rejection.

You should criticize the blog post for misapplying the ideas in the paper.

They are just different units of work.

If the same person wrote both, you’re also welcome to judge/​criticize that person for trying to have it both ways.

OH:

(I wasn’t referring to any specific blog post, I was invoking it as an analogy.

I do unfortunately feel like you are mostly just asserting conclusions, so I don’t super know how to respond. Maybe it would help if you tried to paraphrase what I am trying to say in a way that passes my ITT, or maybe it wouldn’t, I am not sure. I could do the same.)

DHM:

Maybe it would help if I tried to explain some of the relevant academic norms as I see them. I’m not making comparative claims about quality, just explaining how my field manages a collective body of knowledge.

Work is divided into individual units of papers. Each paper is supposed to meet some basic quality tests. E.g., does it reference and contrast with similar ideas, are the contributions clear, are they supported with evidence… etc. I think it does a fine job of this.

Once it’s published in an archival venue, it’s fixed, and people either build on it or interact with it how they see fit. It wouldn’t make sense to say something like “OpenAI’s DA doesn’t solve alignment” to engage with the work unless the paper makes that claim.

If OpenAI’s PR team writes a blog post that says ‘DA Solves Alignment!’ then you should criticize the people who wrote that blog post and not let that blog post prevent you from interacting with DA as a technical method.

This is what I demonstrated in that blog post when I realized that my Twitter thread could be misinterpreted as criticizing the paper. I accompanied it with a tweet thread where I apologized to the paper’s authors in public.

I guess that’s how I wish your community had managed the situation around CIRL.

OH:

This makes sense! Here is roughly my relationship to the things you said:

1. I think you overstate the degree to which academia has these norms.

I can kind of believe it reasonably often aspires to these norms, but de-facto academia does lots of judging things that an author has written somewhere else, and there is a strong culture of trying to prevent scientists from making statements that are too broad or strong to the media, and this is a common form of critique.

Academia is as much on Twitter and Bluesky as other people these days, and we both know how those conversations go. I also agree that that kind of critique cannot fully substitute for other ways of engaging with the work.

2. I read the CIRL paper a lot of times, and I do have some thoughts on how to engage with it on its own terms, but I do think the paper really doesn’t stand on its own in terms of its motivation and what its purpose is.

I believe its internal logic is fine (though I have 1-2 things I would like to dig into where I do think there might be flaws, though it’s unlikely overall), but when you say things like “CIRL should be used as a criteria of optimality” that is not something the CIRL paper says.

That is something you say in secondary material about how you think the work is best applied, and if I understand you correctly, that is the frame in which you would have liked people to engage with it.

I personally like that framing more and indeed just last night sparked by our conversation I chatted with Daniel Filan who helped me see an aspect of the off-switch game paper that I had missed that made me understand what you meant by that a bit more, and which I found valuable. I was unable to get that from the CIRL paper (though I could have maybe gotten it from the off-switch game paper which I have engaged a lot less with). Maybe this is a skill issue, but IDK, I have really engaged with it deeply.

3. I am glad people engaged with Stuart’s claims, which were made in accompanying textbooks and popular books and talks. I think that engagement was good and helped clarify a bunch of people’s perspectives and clear up misunderstandings.

In the context of those conversations “CIRL” was used as an approach and in some sense (though I wish I had a better word) “solution” for AI Alignment.

I think this was confusing, but it is what it is, and is largely what “CIRL” has referred to in the engagement I have seen on LW and presumably also in in-person contexts. I now have more of a sense of there being a different thing you tried to achieve with that paper. I do think it isn’t clear from the paper itself either, but it is clearer now from the other things you have said (and to me after I talked to Daniel). This is of course also in my mind a kind of validation that engaging with secondary material seems frequently necessary for judging a paper.

I don’t know yet what I think about this perspective on CIRL. I don’t expect to love it either, but I agree with you I haven’t seen much engagement with it.

----

Not sure exactly where this leaves us, but it’s a high-level summary of where I am at after chatting for a while.

I overall think you are interpreting at least a bunch of the conversation on LW as something it isn’t, and I think it’s confusing to say that people “didn’t engage with CIRL”, since they clearly did, but I also agree that there is a thing you wanted to do with the paper that has received little engagement and I currently think that is sad, and did probably involve many people making some kind of mistake.

My best guess is that if you had engaged directly on LW with your perspective, especially when Rohin and Lawrence and Daniel were around, it would have gone pretty well and you would have seen much more of the kind of engagement you wanted.

My general experience has been that intellectual discussion offline often just really sucks, especially in more adversarial contexts. Writing things really helps. Having things be in public spaces really helps.

I believe you people were being frustrating to engage with in more private contexts. I don’t super know who you engaged with, and if you want me to DM more details, would like to dig a bit more and improve future things like that.

I do care (though I do overall take less responsibility for that ) about how people behave off of LessWrong in in-person contexts, and want to improve that avenue of engagement with the extended LW community more.

I am also unhappy with various other things you said in this discussion, including various ways you paraphrased me, or summarized what happened on LW, and I think that was bad form, but I don’t think we need to litigate that, I think people can form their own impressions of that by reading our past threads.

DHM:

Oliver, I don’t know what else to say. I review papers regularly, it’s basically always double blind.

I evaluate work on its merits, as my blog post states.

In my experience, these are the norms and processes that 1) I interact with; and 2) produce good work.

I believe my experience is representative. Both of academia (at least CS) and of academics engaging with LW.

You don’t. That’s all this is.

(Btw, because your response started with a statement about academia broadly instead of the very specific example of a norm I think is useful and took time to explain to you, I didn’t and won’t read it.)

OH:

Wow, man, you really do seem like a dick.

I spent over an hour writing an essay, trying to distill a huge amount of conversation we had into a perspective that I hope would give you more traction to engage with, admitting various places you were right, and overall hoping to leave things in a place where some future engagement might be possible, really trying hard to respect your perspective.

And you just insult me, tell me you didn’t read it, and engage in various social slapdown motions.

I guess my engagement with your tweets will be limited to just correcting the public record where you say wrong things. I had hoped I could engage, but man, you make it really hard. Just take the 3 minutes to read what I wrote with substantial personal effort.

DHM:

My first tweet said I wasn’t interested in debating how general these norms are. And you responded with a comment that said you thought they aren’t adhered to broadly.

I never made a claim about academia. I made claims about what I do as part of my day to day job.

> DHM: Maybe it would help if I tried to explain some of the relevant academic norms as I see them. I’m not making comparative claims about quality, just explaining how my field manages a collective body of knowledge.

> OH: This makes sense! Here is roughly my relationship to the things you said:
> 1. I think you overstate the degree to which academia has these norms.

(1)

DHM:

When you start by discussing something I explicitly said I don’t want to discuss, I’m justified in not responding.

OH:

Where did you say that? Am I being insane? I really don’t see any place where you said anything I could interpret that way.

Please, just call in some third party and let them read over stuff. I feel so confused about what you are trying to do here.

(2)

OH:

Your first screenshot is obviously setting up a statement about what prevalent academic norms are. I am so confused about what is going on here.

I am happy to run this by any reasonable third party and ask them whether this makes sense. I don’t think I am being dense in misreading things here, I really think it’s very unambiguous that both the validity of your point relied on this being generally true in at least CS academia, and that you are not just talking about your own experience.

DHM:

It isn’t. It is saying what norms *I* adhere to in *my* community.

I’m saying that, when *I* review papers it is double blind. When *I* submit papers it is double blind. When *I* evaluate work I evaluate it on its merits.

I’m providing these as an example of how I think it is appropriate to engage with work and what I think a good norm for engaging with work is.

Whether or not that’s broadly true in academia is not relevant.

This is not some abstract debate. I’m literally talking about my day job.

OH:

> It isn’t. It is saying what norms *I* adhere to in *my* community.
> I’m saying that, when *I* review papers it is double blind. When *I* submit papers it is double blind. When *I* evaluate work I evaluate it on its merits.

You are referring to “your field” clearly an entity much larger than just your personal experiences!

Clearly if I showed you statistics about the prevalence of double-blind review in CS that contradicted your anecdata would be a valid objection to what you are saying.

I really don’t think there is any ambiguity here. Like, OK, it’s fine, you don’t want to talk about what is prevalent in academia. I will drop that point, but you did sure make some claims about what is prevalent in academia.

DHM:

To name a few off the top of my head:

ICML: double-blind; NeurIPS: double-blind; AAAI: double-blind; ICLR: double-blind; RSS: double-blind; IJCAI: double-blind; RLC: double-blind; AISTATS: double-blind; FAccT: double-blind.

Basically major venue in my field is double-blind.

I’m saying that basically everywhere I publish is double-blind as evidence that I think it’s reasonable to ask that my work be evaluated on its own merits.

You’ve said it’s unreasonable. I know it’s not because I’m a part of a research community where we follow this norm.

You’ve said I have no reason to expect you to judge my work on its merits. You’ve used this justify your community not engaging with my work and collectively misunderstanding.

I’m saying I’m part of a community where I can and do expect this. It is very nice.

And if LW can’t engage in that way, I see very little value in spending effort on it.

It’s why I chose not to engage during my PhD. I thought it might be helpful for you to get some information that you don’t typically get about how your community is perceived.

OH:

You might totally be right about prevalence at the top of CS for double-blind review in particular! You are much more familiar here.

What I was saying is that clearly you were making statements that did not just rely on your personal experience, but somehow still felt justified dismissing my complete critique because you claim to have explicitly said somewhere that you are not interested in talking about what academic norms or standards are.

Like, IDK, you might totally be right about academic norms in your domain of CS, but I am reacting to the part where me expressing disagreement with that somehow therefore made my writing unworthy of engagement and as if I had violated some boundary of yours, when as far as I can tell you had not expressed anything like that before (and again, I am happy to run our conversation here by third parties and ask them whether any such boundary was set).

DHM:

Of course I’m right about my domain of CS.

My “domain of CS” is what I meant by “my field” in the tweet that you claimed was obviously about comparing with academia broadly.

> DHM: Maybe it would help if I tried to explain some of the relevant academic norms as I see them. I’m not making comparative claims about quality, just explaining how my field manages a collective body of knowledge.

In fact, I intentionally restricted my tweet to be about an area where I can speak from direct personal experience in order to prevent exactly the response you had.

OH:

I don’t really understand this thread.

I was responding to the part where you said you weren’t going to read my really quite effortful essay trying to establish common ground and give a summary of our whole conversation including all your previous statements because apparently you implied somewhere you didn’t want me to say anything about academic norms, and I still don’t really know what’s going on.

I mean it’s fine, I’ll probably post the summary as a top-level tweet or something because it seems helpful for people to get oriented to what is going on. I think it would be nice for you to read it to. It’s not like amazing, but it was trying to establish common ground.

(And to be clear, I never had and still don’t have any strong opinions about double blind review in CS. It seems quite prevalent. I never doubted that or disagreed with it. I used it as an example of how you did clearly make a claim that was about a whole academic field that could in-principle be subject to falsification.

The critique I actually made was talking about how academics use Twitter, which just seems like a totally reasonable critique. IDK what’s up with your reaction to that being so strong that you refuse to read the rest of my summary)

DHM:

I’m exhausted.

I said I was focused on my specific field. I gave examples from my personal experience. When your response starts by challenging me on something I intentionally didn’t claim, it doesn’t make me hopeful to find value in the rest.

A lot of what you’ve done in this thread has been gaslighting me about a decade of experience.

I said I think the community is unwelcoming. It’s not an uncommon opinion for people like me. You’re very motivated to convince me I’m wrong and you don’t understand my community.

You could have just responded to the first tweet saying something like “I’m sorry you feel that way, that’s a bummer, what could we do to make it better from your perspective.”

Instead you accused me of misrepresenting the situation.

OH:

Because you have! Repeatedly! That is what this current thread is about! You summarized me as “Oliver explicitly said he thinks [intentionally misunderstanding a research proposal for PR reasons] is a good norm for a research community.”

This is not what I said. It’s obviously not what I said. You have also done this in other threads.

I am interested in engaging with you, but you repeatedly starting a new thread that you start by completely misrepresenting everyone you disagreed with in the previous thread is extremely frustrating. I will keep showing up and correcting the record, but man it is exhausting and I don’t want it.

DHM:

This is the comment I referred to with my tweet. I stand by it.

You were not judging the mathematical framework on its merits or stated claims. That’s a bad way to behave in a research community.

> > DHM: Which, to be clear, is that you judge my ideas based on the papers I write, not the general audience books I didn’t write.

> OH: I will judge your ideas based on all the context I have available to judge you! Of course I will not judge your ideas on the basis of books you didn’t write, but I will judge the ideas and arguments of your coauthors.
> As does everyone else.
> I will judge your ideas on the basis of tweet threads, on the basis of your books, on the basis of your reputation among peers, and on the basis of spoken conversations.
> Those are all useful evidence what your ideas are! Ideas do not live in some magical magisteria of academic papers that no other medium is allowed to touch.

“I will judge your ideas based on all the context I have available to judge you.”

It’s clear you don’t believe in judging papers based on the stated claims they make and the associated evidence they provide.

You’re also welcome to judge me overall as a researcher. That’s something we do too. But that’s a related but different question.

Second Thread (w/​ Herbie Bradley, on peer review)

DHM:

I initially thought Oliver’s claim here was that LW was actually welcoming and open.

After several examples, it’s clear what he meant was “actually, it’s good you felt unwelcome b/​c you deserved it.”

Which is nice to have in the open.

> OH: Post the link! If it’s as you say it would be easy for people to form their own impression. Not sharing it IMO makes it likely there is some misrepresentation going on.

Herbie Bradley:

interesting discussion

I’m going to give the extra contrarian take here: I’ve always thought LW’s “publication” quality norms and founding assumptions about AI were bad and wrong, which is why I’ve never engaged there despite being a lurker for over a decade.

But I’ve also increasingly come to dislike academia, and don’t take peer review very seriously any more; LW at least has the virtue of being filled with smarter and more well informed people than I’ve met in universities

Reasonable discussion on AI these days therefore seems to take place exclusively in private Slacks, Discords, and Google Doc comments...

DHM:

I’m curious what drives your opinion of academia and what’s changed over time.

IMO, peer review is flawed but valuable for a few reasons. If nothing else, having a clear set of criteria that papers are supposed to meet is quite valuable.

Image

Herbie Bradley:

as with many, I had multiple poor experiences with peer review, to the point where I stopped advising people to spend time on submitting to conferences. even before ChatGPT reviews, the review system in AI degraded under exponentially growing load—though I don’t think it’s much better in other fields

I’m not too fussed about peer review ultimately because I expect that AI can function better than the average reviewer within a year or two

separately though, academia has many cultural issues (some of which you point out around it not taking AGI seriously). my biggest cultural disagreement is simply that I think researchers should have clear theories of impact in mind for their work rather than just doing what feels like random search over interesting research ideas, in most cases

DHM:

I think the only difference is I am fussed about peer review.

I think these are all fair critiques. I think we can improve and I hope we do. But the system really has crumbled. It’s sad.

What I hope we keep is the norms around maintaining an evolving collective standard of accepted work and getting anonymous critical feedback. I think these see important signals in a modern research ecosystem.

Day 4: DHM says he is glad to have had this conversation in public

DHM writes a new (short) thread.

DHM:

It’s reasonable to wonder why I participated in an endless and repetitive argument with LWers the last few days.

For me, I’ve had many discussions like it. This is just a drop in the bucket.

I wanted to have one in public so there was more common knowledge of what it is like. I imagine it was somewhat exhausting to watch as well.

(1)

Lawrence Chan:

Ah, that makes sense—I was confused why you spent so much time on this.

(2)

@rina_gyro

yeah...felt pretty dismaying & i don’t really know what to make of it in the end

DHM:

Same!

FWIW, I don’t think it has to be this way and I’ve had lots of good/​productive discussions with LW people. Both over the last 10 years and in the last few days.

(3)

Esben Kran:

Thanks, love the public discussions on here (and on LW for that matter) - always insightful.

DHM:

Thanks for participating!

Not sure whether to include this subthread on judging Eliezer

  1. ^

    See the tag Inverse Reinforcement Learning for all the posts on LW on the broader subject, see this original paper by Hadfield-Menell, Stuart Russell, and more on the idea.

  2. ^

    There was actually an even earlier day of discussion the day before which I haven’t collated (Day −1: A, B, C). I didn’t include it because the CIRL example is much stronger as the author can speak to their experiences, also because this post is already 27,000 words long, and also because the author said they optimized it spicy, yet it turned out the example didn’t really check out, which is a bad combination.

  3. ^

    Lawrence Chan:

    After reading this thread and its responses, I’m reminded that Twitter seems even more intentionally designed to prevent people with different intellectual commitments from having a good conversation :(

  4. ^

    OH:

    I don’t think preference uncertainty is a helpful framing, so that would be a false thing to say from my perspective. I don’t think “goal uncertainty” is a particularly measurable quantity in modern ML systems, not something that would be particularly good to optimize for, and I think in most contexts a confused concept.

  5. ^

    tutor vals:

    Thank you for all these clarifications. I have pretty long had a mildly negative impression of CHAI based on it being “Stuart’s lab” and Stuart’s communication, but this makes me believe I should look more into it if I wanted a better assessment

    Lawrence Chan:

    Glad you found the context helpful. Worth noting that most of my fellow PhDs have graduated/​left, I don’t know the people who are still there very well anymore:

    • Dylan Hadfield-Menell → MIT

    • Rohin Shah, Michael Dennis, Scott Emmons, Erik Jenner → Google DeepMind

    • Adam Gleave → founded FAR

    • Johannes Treutlein → Anthropic

    • Sam Toyer → OpenAI

    • Daniel Filan → MATS

    • Smitha Milli → Meta

    • Me → METR

    Off the top of my head Rachel Freedman, Micah Carroll, and Cassidy Laidlaw are the only people who I’m familiar with who’s still there.

    [BP: Note that I cleaned up the above list to correct one mistake, typos, and include full names. My source of truth is the CHAI People page.]

    Daniel Filan:

    There’s something kinda interesting about how CHAI strongly skews GDM-wards relative to the field. Maybe just because GDM is more academic?

    See also Anca going there

  6. ^

    Isabel J:

    I agree with some of your critiques about the dynamics on LW having room for improvement, but I’m very skeptical of this claim about academic authorship. Empirically authorship matters for getting through peer review, and I think it still matters after.

    https://​​pmc.ncbi.nlm.nih.gov/​​articles/​​PMC9564227/​​

  7. ^

    OH:

    I am not grateful for people raising awareness for things I care about by saying false and misleading things. That much should hopefully be obvious.

    I am very grateful for much of Stuart’s communications!

    But his treatment of CIRL I think has both hurt the credibility of people’s concerns about AI risk and was among his least effective work in terms of reach. I feel reasonably confident calling this one a miss (though not like overwhelmingly confident, Stuart is a smart man, and maybe there was a deeper reason there).

  8. ^

    Damon Sasi:

    Where were people being aggressive toward you?

  9. ^

    OH:

    I do not doubt your experience, but clearly the actual topic of discourse is “was the reaction actually uncalibrated or bad?”, not “did Dylan find the reaction unwelcoming”. I agree the latter seems unambiguously correct. I think the former is unclear and I don’t currently buy it

  10. ^

    tv:

    I’m sorry my earlier reply was denigrating. I would like to offer clarification that the fact that your work doesn’t evaluate well according to my interests doesn’t mean you didn’t succeed according to yours, or that I wish you didn’t do the work you did (it’s the contrary here).

    Me replying to this thread is generally slightly off topic as I’m not part of the LW “alignement community”, only part of the “ai risk reducing” community, and I don’t wish my lack of interest in technical work here to reflect badly on the community you wanted this to reach.

    DHM:

    Noted.

  11. ^

    Lawrence Chan:

    I agree with your overall point, but it’s not just Eliezer who was like this! Nate Soares at the very least, but there were many commentators on LW who were aggressive toward non-orthodox ideas they disagreed with (esp prosaic AI alignment stuff).

    LW culture is way more blunt than academia; it’s definitely a different environment, and the karma system + relative obscurity keeps the quality pretty high. I think it’s a lot better than most other internet forums (e.g. this one), but many people do find the bluntness offputting.

    Also, as an aside, I appreciate your long tweet thread here; thanks for writing it!

  12. ^

    DHM:

    I was unaware of that process. I see what people in my circles talk about. When I have time I go read the papers. Many of the papers I end up read are simply low-quality and have few things that build on them.

    This is best understood as a summary of my experience. But I have had a lot of experience and seen a lot of grand theories be a big deal and then fall out of favor.

  13. ^

    Buck Shlegeris:

    I would prefer people didn’t use the word “control theory” to refer to AI control, though obviously we brought this upon ourselves by using the word “control”...

    OH:

    Oops, and indeed the above was referring to control theory, not AI control. I think this was just me being dumb, the prior really should be on the other one.

  14. ^

    OH:

    I think ML Academia is definitely vastly better than most! (in substantial parts by shedding and innovating on academic norms)

    But we are talking about academia as a class here, and importantly, ML never had a space for most AI Alignment work, and still doesn’t. There is no natural place to publish for example control work in ML academia. There is no natural place to discuss potential paths to using AI coordination technology to assist takeoff navigations. There is no place to discuss ways of human intelligence augmentation.

    So given that, what is the supposed option here? There are fields that nominally are about those things, but good luck publishing in them. Doing so is an absolutely horrendous experience.

    I think LW is also not perfect here, but obviously vastly better than anything else for talking about these topics, including on the very dimensions that Dylan is critiquing here.

  15. ^

    tutor vals:

    An aside but LW participation is probably associated with higher variance outcomes in mental health, with many (with respect to community size) very bad outcomes. It’s complex to analyse, but just saying it’s not a slam dunk that better to be in the LW community than to do a phd

    OH:

    Yep, definitely agree. Being an independent researcher publishing on LW is probably not much better for you than doing a PhD. Both are associated with pretty bad mental health outcomes.

  16. ^

    OH:

    I mean, that is also a definitely inaccurate take. Most researchers are not PhD trained!

    Most knowledge production does not happen by people who hold PhDs, why would you think that?

    When companies develop knowledge PhDs are only a hugely small fraction of their work force.

    It seems you have deeply misunderstood my claim about what knowledge production looks like. I am not talking about industry labs that publish in academia. I am talking about the whole engine of industrial and intellectual progress largely independent of journals, working at trade shows, making patents, sharing ideas, improving things year after year.

    This kind of work is not done by PhDs.

  17. ^

    Jan Kulveit:

    To prevent possible confusion: 1. It wasn’t the top comment before this exchange on X—the Alignment Forum kama before I posted the link was just 1, meaning the comment was downvoted by experienced users in nontrivial amount.
    2. The karma of the comment went up now—my guess is mostly for tribal reasons.

    DHM:

    Oh, thanks for clarifying.

    I just went to look after seeing your post.

  18. ^

    OH:

    Going back to the object level, statements in textbooks and published books and academic talks are clearly normal intellectual contributions that can be engaged with on their own terms.

    I am confident I will be able to find some record of you, and approximately every single other academic, engaging with those materials when trying to understand or critique an idea or set of ideas.

    It’s not a weird thing to do. I don’t understand what kind of norm you are invoking here.

    Yes, in as much as someone explicitly asks “please engage with just this specific paper and ignore other stuff because it is misleading” ignoring that would be weird, but I have never before now heard anyone say that about the CIRL work.

No comments.