Hello! This is jacobjacob from the LessWrong / Lightcone team.
This is a meta thread for you to share any thoughts, feelings, feedback or other stuff about LessWrong, that’s been on your mind.
Examples of things you might share:
“I really like agree/disagree voting!”
“What’s up with all this Dialogues stuff? It’s confusing…
“Hm… it seems like recently the vibe on the site has changed somehow… in particular [insert 10 paragraphs]”
...or anything else!
The point of this thread is to give you an affordance to share anything that’s been on your mind, in a place where you know that a team member will be listening.
(We’re a small team and have to prioritise what we work on, so I of course don’t promise to action everything mentioned here. But I will at least listen to all of it!)
I haven’t seen any public threads like this for a while. Maybe there’s a lot of boiling feelings out there about the site that never get voiced? Or maybe y’all don’t have more to share than what I find out from just reading normal comments, posts, metrics, and Intercom comments? Well, here’s one way to find out! I’m really curious to ask and see how people feel about the site.
So, how do you feel about LessWrong these days? Feel free to leave your answers below.
I mostly feel bad about LessWrong these days. I slightly dread logging on, I don’t expect to find much insightful on the website, and think the community has a lot of groupthink / other “ew” factors that are harder for me to pin down (although I think that’s improved over the last year or two). I also feel some dread at posting this because it might burn social capital I have with the mods, but whatever.
(Also, most of this stuff is about the community and not directly in the purview of the mods anyways.)
Here are some rambling thoughts, though:
I think there are pretty good reasons that the broader AI community hasn’t taken LW seriously.
I feel a lot of cynicism. I worry that colors my lens here. But I’ll just share what I see looking through that lens.
Also some of my cynicism comes from annoying-feeling object-level disagreements driving me away from the website. Probably other people are having more fun.
(High confidence) I feel like the project of thinking more clearly has largely fallen by the wayside, and that we never did that great of a job at it anyways.
Over time, I’ve felt myself grow more distant from this community and website. At times, it feels sad. At times, it feels correct. Sometimes it feels both.
(Medium confidence, unsure if relevant to LW itself) In the bay area community, there are lots of professionally relevant events which are de facto gated by how much random people like you on a personal level (namely, the organizers). There’s also a lot of weird social stuff but IDK how relevant that is to LW.
(Medium confidence) It seems to me that often people rehearse fancy and cool-sounding reasons for believing roughly the same things they always believed, and comment threads don’t often change important beliefs. Feels more like people defensively explaining why they aren’t idiots, or why they don’t have to change their mind. I mean, if so—I get it, sometimes I feel that way too. But it sucks and I think it happens a lot.
I feel worried that there are a bunch of people with entrenched worldviews who basically never change their minds about anything important. Seems unhealthy on a community level.
Like, there is a way that it feels to be defending yourself or sailing against the winds of counterevidence to your beliefs, and it’s really really important to not do that. Come on guys :(
(When Wei_Dai introduced Updateless Decision Theory, it wasn’t about this kind of “updatelessness”! :( )
(High confidence) I think this community has engaged in a lot of hero worship. I think to some extent I have benefited from this, though I don’t think I’m the prototype. But, seriously guys, looking back, I think this place has been pretty creepy in some ways.
The way people praise/exalt Eliezer and Paul is just… weird. The times I’d be at an in-person workshop, and people would spend time “ranking” alignment researchers. Feels like a social status horse race, and probably LessWrong has some direct culpability here.
But people don’t seem to take Eliezer as seriously these days, which I think is great, so maybe it’s less of a problem now.
I think this is Eliezer’s fault in his case and mostly not Paul’s fault for his own rep, but IDK.
I think we’ve kinda patted ourselves on the back for being awesome and ahead of the curve, even though, in terms of alignment, I think we really didn’t get anything done until 2022 or so, and a lot of the meaningful progress happened elsewhere.
(Medium confidence) It seems possible to me that “taking ideas seriously” has generally meant something like “being willing to change your life to further the goals and vision of powerful people in the community, or to better accord with socially popular trends”, and less “taking unconventional but meaningful bets on your idiosyncratic beliefs.”
Somewhat relatedly, there have been a good number of times where it seems like I’ve persuaded someone of A and of A⟹B, and they still don’t believe B, and coincidentally B is unpopular.
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)
(Medium-high confidence) I think that alignment “theorizing” is often a bunch of philosophizing and vibing in a way that protects itself from falsification (or even proof-of-work) via words like “pre-paradigmatic” and “deconfusion.” I think it’s not a coincidence that many of the “canonical alignment ideas” somehow don’t make any testable predictions until AI takeoff has begun. 🤔
I expect there to be a bunch of responses which strike me as defensive, revisionist gaslighting, and I don’t know if/when I’ll reply.
This sentiment resonates strongly with me.
A personal background: I remember getting pretty heavily involved in AI alignment discussions on LessWrong in 2019. Back then I think there were a lot of assumptions people had about what “the problem” was that are, these days, often forgotten, brushed aside, or sometimes even deliberately minimized post-hoc in order to give the impression that the field has a better track record than it actually does. [ETA: but to be clear, I don’t mean to say everyone made the same mistake I describe here]
This has been a bit shocking and disorienting to me, honestly, because at the time in 2019 I didn’t get the strong impression that people were deliberately constructing unfalsifiable models of the problem. I had the vague impression that people had relatively firm views that made predictions about the world prior to superintelligence, and that these views were open to revision upon new evidence. And that naiveté led me to experience the last few years of evidence a bit differently than I think some other people.
To give a cursory taste of what I’m talking about, we can consider what I think of as a representative blog post from the genre of pre-2020 alignment content: Rohin Shah and Dmitrii Krasheninnikov’s “Learning Preferences by Looking at the World”. This is by no means a cherry-picked example either, in my opinion (and I don’t mean to criticize Rohin specifically, I’m just using this blog post as a representative example of what people talked about at the time). In the blog post, they state,
Suppose in 2024-2029, someone constructs an intelligent robot that is able clean a room to a high level of satisfaction, consistent with the user’s intentions, without any major negative side effects or general issues of misspecification. It doesn’t break any vases while cleaning. It respects all basic moral norms you can think of. It lets you shut it down whenever you want. And it actually does its job of cleaning the room in a reasonable amount of time.
If that were to happen, I think an extremely natural reading of the situation is that a substantial part of what we thought “the problem” was in value alignment has been solved, from the perspective of this blog post from 2019. That is cause for an updating of our models, and a verbal recognition that our models have updated in this way.
Yet, that’s not how I think everyone on LessWrong would react to the development of such a robot. My impression is that a large fraction, perhaps a majority, of LessWrongers would not share my interpretation here, despite the plain language in the post explaining what they thought the problem was. Instead, I imagine many people would respond to this argument basically saying the following:
“We never thought that was the hard bit of the problem. We always thought it would be easy to get a human-level robot to follow instructions reliably, do what users intend without major negative side effects, follow moral constraints including letting you shut it down, and respond appropriately given unusual moral dilemmas. The idea that we thought that was ever the problem is a misreading of what we wrote. The problem was always purely that alignment issues would arise after we far surpassed human intelligence, at which point entirely novel problems will arise.”
But the blog post said there was a problem, gave an example of the problem manifesting, and then spent the rest of the post trying to come up with solutions. The authors gave no indication that this particular problem was trivial, or that the example used was purely illustrative and had nothing to do with the type of real-world issues that might arise if we fail to solve the problem. If I was misreading the blog post at the time, how come it seems like almost no one ever explicitly predicted at the time that these particular problems were trivial for systems below or at human-level intelligence?!?
To clarify, I’m in full agreement with anyone who simply says that the alignment problem looks like it might still be hard, based on different arguments than the one presented in this blog post. There were a lot of arguments people gave back then, and some of the older arguments still look correct. Perhaps most significantly, robustness to distribution shifts still looks reasonably hard as a problem. But the blog post I cited explicitly said “Note that we’re not talking about problems with robustness and distributional shift”!
At this point I think there are a number of potential replies from people who still insist that the LW models of AI alignment were never wrong, which I (depending on the speaker) think can often border on gaslighting:
“Rohin’s point wasn’t that this problem would be hard. He was using it as a mere stepping stone to explain much harder problems of misspecification that were, at that time, purely theoretical.”
Then why did the paper explicitly say, “for many real-world tasks it can be challenging to specify a reward function that captures human preferences, particularly the preference for avoiding unnecessary side effects while still accomplishing the goal (Amodei et al., 2016).”
Why is it so hard to find people explicitly saying that this specific problem, and the examples illustrating it, were not meant to be seriously representative of the hard parts of alignment at the time?
Isn’t it still pretty valuable to point out that we’re solving stepping stones on the path towards the ‘real problem’?
“Sure, Rohin thought that was a major problem, but we [our organization/thought cluster/ideological group] never agreed with him.”
Oh really? Did you ever explicitly highlight this particular disagreement at the time? He wasn’t exactly a minor researcher at the time. And this blog post is only one of a number of blog posts expressing essentially an identical sentiment.
[ETA: to be fair, I do think there were some people who did genuinely disagree with Rohin’s framing. I don’t mean to accuse everyone of making the same error.]
“Yes, this particular part of the alignment problem looks easier than we thought, but serious people always thought that this was going to be one of the easiest subproblems, compared to other things. This problem was considered a very minor sub-problem of value alignment that merited like 1% of researcher-hours.”
Then why is it so easy to find countless blog posts of a similar nature from alignment researchers at the time, presenting pretty much the same problem and then presenting an attempt to solve it? Did all those people simply knowingly work on one of the easiest sub-problems of alignment?
I wrote a fair amount about alignment from 2014-2020[1] which you can read here. So it’s relatively easy to get a sense for what I believed.
Here are some summary notes about my views as reflected in that writing, though I’d encourage you to just judge for yourself[2] by browsing the archives:
I expected AI systems to be pretty good at predicting what behaviors humans would rate highly, long before they were catastrophically risky. This comes up over and over again in my writing. In particular, I repeatedly stated that it was very unlikely that an AI system would kill everyone because it didn’t understand that people would disapprove of that action, and therefore this was not the main source of takeover concerns. (By 2017 I expected RLHF to work pretty well with language models, which was reflected in my research prioritization choices and discussions within OpenAI though not clearly in my public writing.)
I consistently expressed that my main concerns were instead about (i) systems that were too smart for humans to understand the actions they proposed, (ii) treacherous turns from deceptive alignment. This comes up a lot, and when I talk about other problems I’m usually clear that they are prerequisites that we should expect to succeed. Eg.. see an unaligned benchmark. I don’t think this position was an extreme outlier, my impression at the time was that other researchers had broadly similar views.
I think the biggest-alignment relevant update is that I expected RL fine-tuning over longer horizons (or even model-based RL a la AlphaZero) to be a bigger deal. I was really worried about it significantly improving performance and making alignment harder. In 2018-2019 my mainline picture was more like AlphaStar or AlphaZero, with RL fine-tuning being the large majority of compute. I’ve updated about this and definitely acknowledge I was wrong.[3] I don’t think it totally changes the picture though: I’m still scared of RL, I think it is very plausible it will become more important in the future, and think that even the kind of relatively minimal RL we do now can introduce many of the same risks.
In 2016 I pointed out that ML systems being misaligned on adversarial inputs and exploitable by adversaries was likely to be the first indicator of serious problems, and therefore that researchers in alignment should probably embrace a security framing and motivation for their research.
I expected LM agents to work well (see this 2015 post). Comparing this post to the world of 2023 I think my biggest mistake was overestimating the importance of task decomposition vs just putting everything in a single in-context chain of thought. These updates overall make crazy amplification schemes seem harder (and to require much smarter models than I originally expected, if they even make sense at all) but at the same time less necessary (since chain of thought works fine for capability amplification for longer than I would have expected).
I overall think that I come out looking somewhat better than other researchers working in AI alignment, though again I don’t think my views were extreme outliers (and during this period I was often pointed to as a sensible representative of fairly hardcore and traditional alignment concerns).
Like you, I am somewhat frustrated that e.g. Eliezer has not really acknowledged how different 2023 looks from the picture that someone would take away from his writing. I think he’s right about lots of dynamics that would become relevant for a sufficiently powerful system, but at this point it’s pretty clear that he was overconfident about what would happen when (and IMO is still very overconfident in a way that is directly relevant to alignment difficulty). The most obvious one is that ML systems have made way more progress towards being useful R&D assistants way earlier than you would expect if you read Eliezer’s writing and took it seriously. By all appearances he didn’t even expect AI systems to be able to talk before they started exhibiting potentially catastrophic misalignment.
I think my opinions about AI and alignment were much worse from 2012-2014, but I did explicitly update and acknowledge many mistakes from that period (though some of it was also methodological issues, e.g. I believe that “think about a utility function that’s safe to optimize” was a useful exercise for me even though by 2015 I no longer thought it had much direct relevance).
I’d also welcome readers to pull out posts or quotes that seem to indicate the kind of misprediction you are talking about. I might either acknowledge those (and I do expect my historical reading is very biased for obvious reasons), or I might push back against them as a misreading and explain why I think that.
That said, in fall 2018 I made and shared some forecasts which were the most serious forecasts I made from 2016-2020. I just looked at those again to check my views. I gave a 7.5% chance of TAI by 2028 using short-horizon RL (over a <5k word horizon using human feedback or cheap proxies rather than long-term outcomes), and a 7.5% chance that by 2028 we would be able to train smart enough models to be transformative using short-horizon optimization but be limited by engineering challenges of training and integrating AI systems into R&D workflows (resulting in TAI over the following 5-10 years). So when I actually look at my probability distributions here I think they were pretty reasonable. I updated in favor of alignment being easier because of the relative unimportance of long-horizon RL, but the success of imitation learning and short-horizon RL was still a possibility I was taking very seriously and overall probably assigned higher probability to than almost anyone in ML.
I agree, your past views do look somewhat better. I painted alignment researchers with a fairly broad brush in my original comment, which admittedly might have been unfair to many people who departed from the standard arguments (alternatively, it gives those researchers a chance to step up and receive credit for having been in the minority who weren’t wrong). Partly I portrayed the situation like this because I have the sense that the crucial elements of your worldview that led you to be more optimistic were not disseminated anywhere close to as widely as the opposite views (e.g. “complexity of wishes”-type arguments), at least on LessWrong, which is where I was having most of these discussions.
My general impression is that it sounds like you agree with my overall take although you think I might have come off too strong. Perhaps let me know if I’m wrong about that impression.
Some thoughts on my journey in particular:
When I joined AI safety in late 2017 (having read approximately nothing in the field), I thought of the problem as “construct a utility function for an AI system to optimize”, with a key challenge being the fragility of value. In hindsight this was clearly wrong.
The Value Learning sequence was in large part a result of my journey away from the utility function framing.
That being said, I suspect I continued to think that fragility-of-value type issues were a significant problem, probably until around mid-2019 (see next point).
(I did continue some projects more motivated from a fragility-of-value perspective, partly out of a heuristic of actually finishing things I start, and partly because I needed to write a PhD thesis.)
Early on, I thought of generalization as a key issue for deep learning and expected that vanilla deep learning would not lead to AGI for this reason. Again, in hindsight this was clearly wrong.
I was extremely surprised by OpenAI Five in June 2018 (not just that it worked, but also the ridiculous simplicity of the methods, in particular the lack of any hierarchical RL) and had to think through that.
I spent a while trying to understand that (at least months, e.g. you can see me continuing to be skeptical of deep learning in this Dec 2018 post).
I think I ended up close to my current views around early-to-mid-2019, e.g. I still broadly agree with the things I said in this August 2019 conversation (though I’ll note I was using “mesa optimizer” differently than it is used today—I agree with what I meant in that conversation, though I’d say it differently today).
I think by this point I was probably less worried about fragility of value. E.g. in that conversation I say a bunch of stuff that implies it’s less of a problem, most notably that AI systems will likely learn similar features as humans just from gradient descent, for reasons that LW would now call “natural abstractions”.
Note that this comment is presenting the situation as a lot cleaner than it actually was. I would bet there were many ways in which I was irrational / inconsistent, probably some times where I would have expressed verbally that fragility of value wasn’t a big deal but would still have defended research projects centered around it from some other perspective, etc.
Some thoughts on how to update based on past things I wrote:
I don’t think I’ve ever thought of myself as largely agreeing with LW: my relationship to LW has usually been “wow, they seem to be getting some obvious stuff wrong” (e.g. I was persuaded of slow takeoff basically when Paul’s post and AI Impacts’ post came out in Feb 2018, the Value Learning sequence in late 2018 was primarily in response to my perception that LW was way too anchored on the “construct a utility function” framing).
I think you don’t want to update too hard on the things that were said on blog posts addressed to an ML audience, or in papers that were submitted to conferences. Especially for the papers there’s just a lot of random stuff you couldn’t say about why you’re doing the work because then peer reviewers will object (e.g. I heard second hand of a particularly egregious review to the effect of: “this work is technically solid, but the motivation is AGI safety; I don’t believe in AGI so this paper should be rejected”).
This matches my sense of how a lot of people seem to have… noticed that GPT-4 is fairly well aligned to what the OpenAI team wants it to be, in ways that Yudkowsky et al said would be very hard, and still not view this as at a minimum a positive sign?
Ie problems of the class ‘I told the intelligence to get my mother out of the burning building and it blew her up so the dead body flew out the window, this is because I wasn’t actually specific enough’ just don’t seem like they are a major worry anymore?
Usually when GPT-4 doesn’t understand what I’m asking, I wouldn’t be surprised if a human was confused also.
I remember explicit discussion about how solving this problem shouldn’t even count as part of solving long-term / existential safety, for example:
“What I understand this as saying is that the approach is helpful for aligning housecleaning robots (using near extrapolations of current RL), but not obviously helpful for aligning superintelligence, and likely stops being helpful somewhere between the two. [...] There is a risk that a large body of safety literature which works for preventing today’s systems from breaking vases but which fails badly for very intelligent systems actually worsens the AI safety problem” https://www.lesswrong.com/posts/H7KB44oKoSjSCkpzL/worrying-about-the-vase-whitelisting?commentId=rK9K3JebKDofvJA3x
See also The Main Sources of AI Risk? where this problem was only part of one bullet point, out of 30 (as of 2019, 35 now).
Two points:
I have a slightly different interpretation of the comment you linked to, which makes me think it provides only weak evidence for your claim. (Though it’s definitely still some evidence.)
I agree some people deserve credit for noticing that human-level value specification might be kind of easy before LLMs. I don’t mean to accuse everyone in the field of making the same mistake.
Anyway, let me explain the first point.
I interpret Abram to be saying that we should focus on solutions that scale to superintelligence, rather than solutions that only work on sub-superintelligent systems but break down at superintelligence. This was in response to Alex’s claim that “whitelisting contributes meaningfully to short- to mid-term AI safety, although I remain skeptical of its robustness to scale.”
In other words, Alex said (roughly): “This solution seems to work for sub-superintelligent AI, but might not work for superintelligent AI.” Abram said in response that we should push against such solutions, since we want solutions that scale all the way to superintelligence. This is not the same thing as saying that any solution to the house-cleaning robot provides negligible evidence of progress, because some solutions might scale.
It’s definitely arguable, but I think it’s likely that any realistic solution to the human-level house cleaning robot problem—in the strong sense of getting a robot to genuinely follow all relevant moral constraints, allow you to shut it down, and perform its job reliably in a wide variety of environments—will be a solution that scales reasonably well above human intelligence (maybe not all the way to radical superintelligence, but at the very least I don’t think it’s negligible evidence of progress).
If you merely disagree that any such solutions will scale, and you’ve been consistent on this point for the last five years, then I guess I’m not really addressing you in my original comment, but I still think what I wrote applies to many other researchers.
I think it’s actually 2 points,
“Misspecified or incorrectly learned goals/values”
“Inability to specify any ‘real-world’ goal for an artificial agent (suggested by Michael Cohen)”
I’m not sure how much to be compelled by this piece of evidence. I’ll point out that naming the same problem multiple times might have gotten repetitive, and there was also no explicit ranking of the problems from most important to least important (or from hardest to easiest). If the order you wrote them in can be (perhaps uncharitably) interpreted as the order of importance, then I’ll note that it was listed as problem #3, which I think supports my original thesis adequately.
Quoting the abstract of MIRI’s “The Value Learning Problem” paper (emphasis added):
And quoting from the first page of that paper:
I won’t weigh in on how many LessWrong posts at the time were confused about where the core of the problem lies. But “The Value Learning Problem” was one of the seven core papers in which MIRI laid out our first research agenda, so I don’t think “we’re centrally worried about things that are capable enough to understand what we want, but that don’t have the right goals” was in any way hidden or treated as minor back in 2014-2015.
I also wouldn’t say “MIRI predicted that NLP will largely fall years before AI can match e.g. the best human mathematicians, or the best scientists”, and if we saw a way to leverage that surprise to take a big bite out of the central problem, that would be a big positive update.
I’d say:
MIRI mostly just didn’t make predictions about the exact path ML would take to get to superintelligence, and we’ve said we didn’t expect this to be very predictable because “the journey is harder to predict than the destination”. (Cf. “it’s easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be”.)
Back in 2016-2017, I think various people at MIRI updated to median timelines in the 2030-2040 range (after having had longer timelines before that), and our timelines haven’t jumped around a ton since then (though they’ve gotten a little bit longer or shorter here and there).
So in some sense, qualitatively eyeballing the field, we don’t feel surprised by “the total amount of progress the field is exhibiting”, because it looked in 2017 like the field was just getting started, there was likely an enormous amount more you could do with 2017-style techniques (and variants on them) than had already been done, and there was likely to be a lot more money and talent flowing into the field in the coming years.
But “the total amount of progress over the last 7 years doesn’t seem that shocking” is very different from “we predicted what that progress would look like”. AFAIK we mostly didn’t have strong guesses about that, though I think it’s totally fine to say that the GPT series is more surprising to the circa-2017 MIRI than a lot of other paths would have been.
(Then again, we’d have expected something surprising to happen here, because it would be weird if our low-confidence visualizations of the mainline future just happened to line up with what happened. You can expect to be surprised a bunch without being able to guess where the surprises will come from; and in that situation, there’s obviously less to be gained from putting out a bunch of predictions you don’t particularly believe in.)
Pre-deep-learning-revolution, we made early predictions like “just throwing more compute at the problem without gaining deep new insights into intelligence is less likely to be the key thing that gets us there”, which was falsified. But that was a relatively high-level prediction; post-deep-learning-revolution we haven’t claimed to know much about how advances are going to be sequenced.
We have been quite interested in hearing from others about their advance prediction record: it’s a lot easier to say “I personally have no idea what the qualitative capabilities of GPT-2, GPT-3, etc. will be” than to say ”… and no one else knows either”, and if someone has an amazing track record at guessing a lot of those qualitative capabilities, I’d be interested to hear about their further predictions. We’re generally pessimistic that “which of these specific systems will first unlock a specific qualitative capability?” is particularly predictable, but this claim can be tested via people actually making those predictions.
I think you missed my point: my original comment was about whether people are updating on the evidence from instruction-tuned LLMs, which seem to actually act on human values (i.e., our actual intentions) quite well, as opposed to mis-specified versions of our intentions.
I don’t think the Value Learning Problem paper said that it would be easy to make human-level AGI systems act on human values in a behavioral sense, rather than merely understand human values in a passive sense.
I suspect you are probably conflating two separate concepts:
It is easy to create a human-level AGI that can passively learn and understand human values (I am not saying people said this would be difficult in the past)
It is easy to create a human-level AGI that acts on human values, in the sense of actually executing instructions that follow our intentions, rather than following a dangerously mis-specified version of what we asked for.
I do not think the Value Learning Paper asserted that (2) was true. To the extent it asserted that, I would prefer to see quotes that back up that claim explicitly.
Your quote from the paper illustrates that it’s very plausible that people thought (1) was true, but that seems separate to my main point: that people thought (2) was not true. (1) and (2) are separate and distinct concepts. And my comment was about (2), not (1).
There is simply a distinction between a machine that actually acts on and executes your intended commands, and a machine that merely understands your intended commands, but does not necessarily act on them as you intend. I am talking about the former, not the latter.
From the paper,
Indeed, and GPT-4 does not base its decisions on a misrepresentation of its programmers intentions, most of the time. It generally both correctly understands our intentions, and more importantly, actually acts on them!
No? GPT-4 predicts text and doesn’t care about anything else. Under certain conditions it predicts nice text, under other not very nice and we don’t know what happens if we create GPT actually capable to, say, bulid nanotech.
For what it’s worth I do remember lots of people around the MIRI-sphere complaining at the time that that kind of prosaic alignment work was kind of useless, because it missed the hard parts of aligning superintelligence.
I agree some people in the MIRI-sphere did say this, and a few of them get credit for pointing out things in this vicinity, but I personally don’t remember reading many strong statements of the form:
“Prosaic alignment work is kind of useless because it will actually be easy to get a roughly human-level machine to interpret our commands reliably, do what you want without significant negative side effects, and let you shut it down whenever you want etc. The hard part is doing this for superintelligence.”
My understanding is that a lot of the time the claim was instead something like:
“Prosaic alignment work is kind of useless because machine learning is natively not very transparent and alignable, and we should focus instead on creating alignable alternatives to ML, or building the conceptual foundations that would let us align powerful AIs.”
As some evidence, I’d point to Rob Bensinger’s statement that,
I do also think a number of people on LW sometimes said a milder version of the thing I mentioned above, which was something like:
“Prosaic alignment work might help us get narrow AI that works well in various circumstances, but once it develops into AGI, becomes aware that it has a shutdown button, and can reason through the consequences of what would happen if it were shut down, and has general situational awareness along with competence across a variety of domains, these strategies won’t work anymore.”
I think this weaker statement now looks kind of false in hindsight, since I think current SOTA LLMs are already pretty much weak AGIs, and so they already seem close to the threshold at which we were supposed to start seeing these misalignment issues come up. But they are not coming up (yet). I think near-term multimodal models will be even closer to the classical “AGI” concept, complete with situational awareness and relatively strong cross-domain understanding, and yet I also expect them to mostly be fairly well aligned to what we want in every relevant behavioral sense.
Well, for instance, I watched Ryan Carey give a talk at CHAI about how Cooperative Inverse Reinforcement Learning didn’t give you corrigibility. (That CIRL didn’t tackle the hard part of the problem, despite seeming related on the surface.)
I think that’s much more an example of
than of
This doesn’t seem to be the same thing as what I was talking about.
Yes, people frequently criticized particular schemes for aligning AI systems, arguing that the scheme doesn’t address some key perceived obstacle. By itself, this is pretty different from predicting both:
It will be easy to get behavioral alignment on slightly-sub-AGI, and maybe even par-human systems, including on shutdown problems
The problem is that these schemes don’t scale well all the way to radical superintelligence.
I remember a lot of people making the second point, but not nearly as many making the first point.
I think I’m missing you then.
If (some) people already had the view that this kind of prosaic alignment wouldn’t scale to Superintelligence, but didn’t express an opinion about whether behavioral alignment of slightly-sub-AGI would be solved, what in what way do you want them to be updating that they’re not?
Or do you mean they weren’t just agnostic about the behavioral alignment of near-AGIs, they specifically thought that it wouldn’t be easy? Is that right?
Two points:
One, I think being able to align AGI and slightly sub-AGI successfully is plausibly very helpful for making the alignment problem easier. It’s kind of like learning that we can create more researchers on demand if we ever wanted to.
Two, the fact that the methods scale surprisingly well to human-level is evidence that they actually work pretty well in general, even if they don’t scale all the way into some radical regime way above human-level. For example, Eliezer talked about how he expected you’d need to solve the suspend button problem by the time your AI has situational awareness, but I think you can interpret this prediction as either becoming increasingly untenable, or that we appear close to a solution to the problem since our AIs don’t seem to be resisting shutdown.
Again, presumably once you get the aligned AGI, you can use many copies of the aligned AGI to help you with the next iteration, AGI+. This seems plausibly very positive as an update. I can sympathize with those who say it’s only a minor update because they never thought the problem was merely aligning human-level AI, but I’m a bit baffled by those who say it’s not an update at all from the traditional AI risk models, and are still very pessimistic.
I feel like I’m being obstinate or something, but I think that the linked article is still basically correct, and not particularly untenable.
From the article...
The key word in that sentence is “consequentialist”. Current LLMs are pretty close (I think!) to having pretty detailed situational awareness. But, as near as I can tell, LLMs are, at best, barely consequentialist.
I agree that that is a surprise, on the old school LessWrong / MIRI world view. I had assumed that “intelligence” and “agency” were way more entangled, way more two sides of the same coin, than they apparently are.
And the framing of the article focuses on situational awareness and not on consequentialism because of that error. Because Eliezer (and I) thought at the time that situational awareness would come after consequentialist reasoning in the tech tree.
But I expect that we’ll have consequentialist agents eventually (if not, that’s a huge crux for how dangerous I expect AGI to be), and I expect that you’ll have “off button” problems at the point when you have “enough” consequentialism aimed at some goal, “enough” strategic awareness, and strong “enough” capabilities that the AI can route around the humans and the human safeguards.
In my opinion, the extent to which the linked article is correct is roughly the extent to which the article is saying something trivial and irrelevant.
The primary thing I’m trying to convey here is that we now have helpful, corrigible assistants (LLMs) that can aid us in achieving our goals, including alignment, and the rough method used to create these assistants seems to scale well, perhaps all the way to human level or slightly beyond it.
Even if the post is technically correct because a “consequentialist agent” is still incorrigible (perhaps by definition), and GPT-4 is not a “consequentialist agent”, this doesn’t seem to matter much from the perspective of alignment optimism, since we can just build helpful, corrigible assistants to help us with our alignment work instead of consequentialist agents.
A side-note to this conversation, but I basically still buy the quoted text and don’t think it now looks false in hindsight.
We (apparently) don’t yet have models that have robust longterm-ish goals. I don’t know how natural it will be for models to end up with long term goals: the MIRI view says that anything that can do science will definitely have long-term planning abilities which fundamentally, entails having goals that are robust to changing circumstances. I don’t know if that’s true, but regardless, I expect that we’ll specifically engineer agents with long term goals. (Whether or not those agents will have “robust” long term goals, over and above what they were prompted to do in a specific situation is also something that I don’t know.)
What I expect to see is agents that have a portfolio of different drives and goals, some of which are more like consequentialist objectives (eg “I want to make the number in this bank account go up”) and some of which are more like deontological injunctions (“always check with my user/ owner before I make a big purchase or take a ‘creative’ action, one that is outside of my training distribution”).
My prediction is that the consequentialist parts of the agent will basically route around any deontological constraints that are trained in.
For instance, the your personal assistant AI does ask your permission before it does anything creative, but also, it’s superintelligently persuasive and so it always asks your permission in exactly the way that will result in it accomplishing what it wants. If there are a thousand action sequences in which it asks for permission, it picks the one that has the highest expected value with regard to whatever it wants. This basically nullifies the safety benefit of any deontological injunction, unless there are some injunctions that can’t be gamed in this way.
To do better than this, it seems like you do have to solve the Agent Foundations problem of corrigibility (getting the agent to be sincerely indifferent between your telling it to take the action or not take the action) or you have to train in, not a deontological injunction, but an active consequentialist goal of serving the interests of the human (which means you have find a way to get the agent to be serving some correct enough idealization of human values).
But I think we mostly won’t see this kind of thing until we get quite high levels of capability, where it is transparent to the agent that some ways of asking for permission have higher expected value than others.
Or rather, we might see a little of this effect early on, but until your assistant is superhumanly persuasive, it’s pretty small. Maybe we’ll see a bias toward accepting actions that serve the AI agent’s goals (if we even know what those are) more, as capability goes up, but we won’t be able to distinguish “the AI is getting better at getting what it wants from the human” from “the AIs are just more capable, and so they come up with plans that work better.” It’ll just look like the numbers going up.
To be clear “superhumanly persuasive” is only one, particularly relevant, example of a superhuman capability that allows you to route around deontological injunctions that the agent is committed to. My claim is weaker if you remove that capability in particular, but mostly what I’m wanting to say is that powerful consequentialism find and “squeezes through” the gaps in your oversight and control and naive-corrigibility schemes, unless you figure out corrigibility in the Agent Foundations sense.
This is one of the main reasons I’m not excited about engaging with LessWrong. Why bother? It feels like nothing I say will matter. Apparently, no pre-takeoff experiments matter to some folk.[1] And even if I successfully dismantle some philosophical argument, there’s a good chance they will use another argument to support their beliefs instead. Nothing changes.
So there we are. It doesn’t matter what my experiments say, because (it is claimed) there are no testable predictions before The End. But also, everyone important already knew in advance that it’d be easy to get GPT-4 to interpret and execute your value-laden requests in a human-reasonable fashion. Even though ~no one said so ahead of time.
When talking with pre-2020 alignment folks about these issues, I feel gaslit quite often. You have no idea how many times I’ve been told things like “most people already understood that reward is not the optimization target”[2] and “maybe you had a lesson you needed to learn, but I feel like I got this in 2018″, and so on. Almost always this comes from people who seem to still not understand what I’m talking about. I feel fine if[3] they disagree with me about specific ideas, but what really bothers me is the revisionism. It’s so annoying.
Like, just look at this quote from the post you mentioned:
And you probably didn’t even select that post for this particular misunderstanding. (EDIT: Note that I am not accusing Rohin of gaslighting on this topic, and I also think he already understood the “reward is not the optimization target” point when he wrote the above sentence. My critique was that the statement is false and would probably lead readers to incorrect beliefs about the purpose of reward in RL.)
I feel a lot of disappointment and sadness. In 2018, I came to this website when I really needed a new way to understand the world. I’d made a lot of epistemic mistakes I wasn’t proud of, and I didn’t want to live that way anymore. I wanted to think more clearly. I wanted it so, so badly. I came to rely and depend on this place and the fellow users. I looked up to and admired a bunch of them (and I still do so for a few).
But the things you mention—the revisionism, the unfalsifiability, the apparent gaslighting? We set out to do better than science. I think we often do worse.
As a general principle, truths are entangled with each other. It’s OK if a theory’s most extreme prediction (e.g. extinction from AI) is not testable at the current moment. It is a highly suspicious state of affairs if a theory yields no other testable predictions. Truths are generally entangled with each other in intricate and manifold ways. There are generally many clever ways to test a theory, given the necessary will and curiosity.
I could give more concrete examples, but that feels indecorous.
Sometimes I instead get pushback like “it seems to me like I’ve grasped the insights you’re trying to communicate, but I totally acknowledge that I might just not be seeing what you’re saying yet.” I respect and appreciate that response. It communicates the other person’s true perception (that they already understand) while not invalidating or assuming away my perspective.
I get why you feel that way. I think there are a lot of us on LessWrong who are less vocal and more openminded, and less aligned with either optimistic network thinkers or pessimistic agent foundations thinkers. People newer to the discussion and otherwise less polarized are listening and changing their minds in large or small ways.
I’m sorry you’re feeling so pessimistic about LessWrong. I think there is a breakdown in communication happening between the old guard and the new guard you exemplify. I don’t think that’s a product of venue, but of the sheer difficulty of the discussion. And polarization between different veiwpoints on alignment.
I think maintaining a good community falls on all of us. Formats and mods can help, but communities set their own standards.
I’m very, very interested to see a more thorough dialogue between you and similar thinkers, and MIRI-type thinkers. I think right now both sides feel frustrated that they’re not listened to and understood better.
(Presumably you are talking about how reward is not the optimization target.)
While I agree that the statement is not literally true, I am still basically on board with that sentence and think it’s a reasonable shorthand for the true thing.
I expect that I understood the “reward is not the optimization target” point at the time of writing that post (though of course predicting what your ~5-years-ago self knew is quite challenging without specific quotes to refer to).
I am confident I understood the point by the time I was working on the goal misgeneralization project (late 2021), since almost every example we created involved predicting ahead of time a specific way in which reward would fail to be the optimization target.
(I didn’t follow this argument at the time, so I might be missing key context.)
The blog post “Reward is not the optimization target” gives the following summary of its thesis,
I hope it doesn’t come across as revisionist to Alex, but I felt like both of these points were made by people at least as early as 2019, after the Mesa-Optimization sequence came out in mid-2019. As evidence, I’ll point to my post from December 2019 that was partially based on a conversation with Rohin, who seemed to agree with me,
I think in this passage I’m imagining that “reward is not the trained agent’s optimization target” quite explicitly, since I’m pointing out that a neural network trained by RL will not necessarily optimize anything at all. In a subsequent post from January 2020 I gave a more explicit example, said this fact doesn’t merely apply to simple neural networks, and then offered my opinion that “it’s inaccurate to say that the source of malign generalization must come from an internal search being misaligned with the objective function we used during training”.
From the comments, and from my memory of conversations at the time, many people disagreed with my framing. They disagreed even when I pointed out that humans don’t seem to be “optimizers” that select for actions that maximize our “reward function” (I believe the most common response was to deny the premise, and say that humans are actually roughly optimizers. Another common response was to say that AI is different for some reason.)
However, even though some people disagreed with this framing, not everyone did. As I pointed out, Rohin seemed to agree with me at the time, and so at the very least I think there is credible evidence that this insight was already known to a few people in the community by late 2019.
I have no stake in this debate, but how is this particular point any different than what Eliezer says when he makes the point about humans not optimizing for IGF? I think the entire mesaoptimization concern is built around this premise, no?
I didn’t mean to imply that you in particular didn’t understand the reward point, and I apologize for not writing my original comment more clearly in that respect. Out of nearly everyone on the site, I am most persuaded that you understood this “back in the day.”
I meant to communicate something like “I think the quoted segment from Rohin and Dmitrii’s post is incorrect and will reliably lead people to false beliefs.”
Thanks for the edit :)
As I mentioned elsewhere (not this website) I don’t agree with “will reliably lead people to false beliefs”, if we’re talking about ML people rather than LW people (as was my audience for that blog post).
I do think that it’s a reasonable hypothesis to have, and I assign it more likelihood than I would have a year ago (in large part from you pushing some ML people on this point, and them not getting it as fast as I would have expected).
FWIW at the time I wasn’t working on value learning and wasn’t incredibly excited about work in that direction, despite the fact that that’s what the rest of my lab was primarily focussed on. I also wrote a blog post in 2020, based off a conversation I had with Rohin in 2018, where I mention how important it is to work on inner alignment stuff and how those issues got brought up by the ‘paranoid wing’ of AI alignment. My guess is that my view was something like “stuff like reward learning from the state of the world doesn’t seem super important to me because of inner alignment etc, but for all I know cool stuff will blossom out of it, so I’m happy to hear about your progress and try to offer constructive feedback”, and that I expressed that to Rohin in person.
Of course, the fact that I think the same thing now as I did in 2020 isn’t much evidence that I’m right.
Why are you so focused on Eliezer/MIRI yourself? If you think you (or events in general) have adequately shown that their specific concerns are not worth worrying about, maybe turn your attention elsewhere for a bit? For example you could look into other general concerns about AI risk, or my specific concerns about AIs based on shard theory. I don’t think I’ve seen shard theory researchers address many of these yet.
I’ll answer this descriptively.
When I trace the dependencies of common alignment beliefs and claims, a lot of them come back to e.g. RFLO and other ideas put forward by the MIRI cluster. Since I often find myself arguing against common alignment claims, I often argue against the historical causes of those ideas, which involves arguing against MIRI-takes.
I’m personally satisfied that their concerns are (generally) not worth worrying about. However, often people in my social circles are not. And such beliefs will probably have real-world consequences for governance.
Neargroup—I have a few friends who work at MIRI, and debate them on alignment ideas pretty often. I also sometimes work near MIRI people.
Because I disagree with them very sharply, their claims bother me more and are rendered more salient.
I feel bothered about MIRI still (AFAICT) getting so much funding/attention (even though it’s relatively lower than it used to be), because it seems to me that since e.g. 2016 they have released ~zero technical research that helps us align AI in the present or in the future. It’s been five years since they stopped disclosing any of their research, and it seems like no one else really cares anymore. That bothers me.
As to why I haven’t responded to e.g. your concerns in detail:
I currently don’t put much value on marginal theoretical research (even in shard theory, which I think is quite a bit better than other kinds of theory).
I feel less hopeful about LessWrong debate doing much, as I have described elsewhere. It feels like a better use of my time to put my head down, read a bunch of papers, and do good empirical work at GDM.
I am generally worn out of arguing about theory on the website, and have been since last December. (I will note that I have enjoyed our interactions and appreciated your contributions.)
Sounds like to the extent that you do have time/energy for theory, you might want to strategically reallocate your attention a bit? I get that you think a bunch of people are wrong and you’re worried about the consequences of that, but diminishing returns is a thing, and you could be too certain yourself (that MIRI concerns are definitely wrong).
And then empirical versus theory, how much do you worry about architectural changes obsoleting your empirical work? I noticed for example that in image generation GAN was recently replaced by latent diffusion, which probably made a lot of efforts to “control” GAN-based image generation useless.
That aside, “heads down empirical work” only makes sense if you picked a good general direction before putting your head down. Should it not worry people that shard theory researchers do not seem to have engaged with (or better yet, preemptively addressed) basic concerns/objections about their approach?
My sense is that this is an inevitable consequence of low-bandwidth communication. I have no idea whether you’re referring to me or not, and I am really not saying you are doing so, but I think an interesting example (whether you’re referring to it or not) are some of the threads recently where we’ve been discussing deceptive alignment. My sense is that neither of us have been very persuaded by those conversations, and I claim that’s not very surprising, in a way that’s epistemically defensible for both of us. I’ve spent literal years working through the topic myself in great detail, so it would be very surprising if my view was easily swayed by a short comment chain—and similarly I expect that the same thing is true of you, where you’ve spent much more time thinking about this and have much more detailed thoughts than are easy to represent in a simple comment chain.
My long-standing position has been and continues to be that the only good medium of communication for this sort of stuff is direct, non-public, in-person communication. That being said, obviously that’s not always workable, and I do think that LessWrong is one of the least bad of all the bad options. Certainly I think it’s preferable to any of the other social media platforms on offer—you mention the broader AI community as not liking LessWrong, but I think they mostly use Twitter for this instead, which seems substantially worse on all of the axes that you criticize. My impression of the quality of AI discourse on Twitter on all sides of the AI safety debate has been very negative, with it mostly just rewarding cheap dunks and increasing polarization—e.g. I felt like I saw this a lot during the OpenAI fiasco. At least on LessWrong I think it is still sometimes possible for nuance to be rewarded rather than punished.
I’ve thought about this claim more over the last year. I now disagree. I think that this explanation makes us feel good but ultimately isn’t true.
I can point to several times where I have quickly changed my mind on issues that I have spent months or years considering:
in early 2022, I discarded my entire alignment worldview over the course of two weeks due to Quintin Pope’s arguments. Most of the evidence which changed my mind was comm’d over Gdoc threads. I had formed my worldview over the course of four years of thought, and it crumbled pretty quickly.
In mid-2022, realizing that reward is not the optimization target took me about 10 minutes, even though I had spent 4 years and thousands of hours thinking about optimal policies. I realized while reading an RL paper say “agents are trained to maximize reward”; reflexively asking myself what evidence existed for that claim; and coming back mostly blank. So that’s not quite a comment thread, but still seems like the same low-bandwidth medium.
In early 2023, a basic RL result came out opposite the way which shard theory predicted. I went on a walk and thought about how maybe shard theory was all wrong and maybe I didn’t know what I was talking about. I didn’t need someone to beat me over the head with days of arguments and experimental results. In the end, I came back from my walk and realized I’d plotted the data incorrectly (the predicted outcome did in fact occur).
I think I’ve probably changed my mind on a range of smaller issues (closer to the size of the deceptive alignment case) but have forgotten about them. The presence of example (1) above particularly suggests to me the presence of similar google-doc-mediated insights which happened fast; where I remember one example, probably I have forgotten several more.
To conclude, I think people in comment sections do in fact spend lots of effort to avoid looking dumb, wrong, or falsified, and forget that they’re supposed to be seeking truth.
In part, I think, because the site makes truth-seeking harder by spotlighting monkey-brain social-agreement elements.
FWIW, LessWrong does seem—in at least one or two ways—saner than other communities of similar composition. I agree it’s better than Twitter overall. But in many ways it seems worse than other communities. I don’t know what to do about it, and to be honest I don’t have much faith in e.g. the mods.[1]
Hopefully my comments do something anyways, though. I do have some hope because it seems like a good amount has improved over the last year or two.
Despite thinking that many of them are cool people.
There’s a caveat here. It’s inevitable for communication that veers towards the emotional/subjective/sympathetic.
When the average writer tries to compress it down to a few hundred or thousand letters on a screen it does often seem ridiculous.
Even from moderately above average writers it often sounds more like anxious upper-middle-class virtue signalling then meaningful conversations.
I think it takes a really really clever writer to make it more substantial than that and escape the perception entirely.
On the other hand, discussions of purely objective topics, that are falsifiable and verifiable by independent third parties, don’t suffer the same pitfalls.
As long as you really know what you are talking about, or willing to learn, even the below average writer can communicate just fine.
For what it’s worth, I would be up for a dialogue or some other context where I can make concrete predictions. I do think it’s genuinely hard, since I do think there is a lot of masking of problems going on, and optimization pressure that makes problems harder to spot (both internally in AI systems and institutionally), so asking me to make predictions feels a bit like asking me to make predictions about FTX before it collapsed.
Like, yeah, I expect it to look great, until it explodes. Similarly I expect AI to look pretty great until it explodes. That seems like kind of a core part of the argument for difficulty for me.
I would nevertheless be happy to try to operationalize some bets, and still expect we would have lots of domains where we disagree, and would be happy to bet on those.
If your hypothesis smears probability over a wider range of outcomes than mine, while I can more sharply predict events using my theory of how alignment works—that constitutes a Bayes-update towards my theory and away from yours. Right?
“Anything can happen before the explosion” is not a strength for a theory. It’s a vulnerability. If probability is better-concentrated by any other theories which make claims about both the present and the future of AI, then the noncommittal theory gets dropped.
Sure, yeah, though like, I don’t super understand. My model will probably make the same predictions as your model in the short term. So we both get equal Bayes points. The evidence that distinguishes our models seems further out, and in a territory where there is a decent chance that we will be dead, which sucks, but isn’t in any way contradictory with Bayes rule. I don’t think I would have put that much probability on us being dead at this point, so I don’t think that loses much of any bayes points. I agree that if we are still alive in 20-30 years, then that’s definitely bayes points, and I am happy to take that into account then, but I’ve never had timelines or models that predicted things to look that different from now (or like, where there were other world models that clearly predicted things much better).
No, I don’t think so. My model(s) I use for AGI risk is an outgrowth of the model I use for normal AI research, and so it makes tons of detailed predictions. That’s why my I have weekly fluctuations in my beliefs about alignment difficulty.
Overall question I’m interested in: What, if any, catastrophic risks are posed by advanced AI? By what mechanisms do they arise, and by what solutions can risks be addressed?
Making different predictions. The most extreme prediction of AI x-risk is that AI presents, well, an x-risk. But theories gain and lose points not just on their most extreme predictions, but on all their relevant predictions.
I have a bunch of uncertainty about how agentic/transformative systems will look, but I put at least 50% on “They’ll be some scaffolding + natural outgrowth of LLMs.” I’ll focus on that portion of my uncertainty in order to avoid meta-discussions on what to think of unknown future systems.
I don’t know what your model of AGI risk is, but I’m going to point to a cluster of adjacent models and memes which have been popular on LW and point out a bunch of predictions they make, and why I think my views tend to do far better.
Format:
Historical claim or meme relevant to models of AI ruin. [Exposition]
[Comparison of model predictions]
The historical value misspecification argument. Consider a model which involves the claim “it’s really laborious and fragile to specify complex human goals to systems, such that the systems actually do what you want.”
This model naturally predicts things like “it’s intractably hard/fragile to get GPT-4 to help people with stuff.” Sure, the model doesn’t predict this with probability 1, but it’s definitely an obvious prediction. (As an intuition pump, if we observed the above, we’d obviously update towards fragility/complexity of value; so since we don’t observe the above, we have to update away from that.)
My models involve things like “most of the system’s alignment properties will come from the training data” (and not e.g. from initialization or architecture), and also “there are a few SGD-plausible generalizations of any large dataset data” and also “to first order, overparameterized LLMs generalize how a naive person would expect after seeing the training behavior” (IE “edge instantiation isn’t a big problem.”) Also “the reward model doesn’t have to be perfect or even that good in order to elicit desired behavior from the policy.” Also noticing that DL just generalizes really well, despite classical statistical learning theory pointing out that almost all expressive models will misgeneralize!
(All of these models offer testable predictions!)
So overall, I think the second view predicts reality much more strongly than the first view.
It’s important to make large philosophical progress on an AI reasoning about its own future outputs. In Constitutional AI, an English-language “constitutional principle” (like “Be nice”) is chosen for each potential future training datapoint. The LLM then considers whether the datapoint is in line with that constitutional principle. The datapoint is later trained on if and only if the LLM concludes that the datapoint accords with the principle. The AI is, in effect, reasoning about its future training process, which will affect its future cognition.
The above “embedded agency=hard/confusing” model would naturally predict that reflection is hard and that we’d need to put in a lot of work to solve the “reflection problem.” While this setup is obviously a simple, crude form of reflection, it’s still valid. Therefore, the model predicts with increased confidence that constitutional AI would go poorly. But… Constitutional AI worked pretty well! RL from AI feedback also works well! There are a bunch of nice self-supervised alignment-boosting methods (one recent one I read about is RAIN).
One reason this matters: Under the “AGI from scaffolded super-LLMs” model, the scaffolding will probably prompt the LLM to evaluate its own plans. If we observe that current models do a good job of self-evaluation,[1] that’s strong evidence that future models will too. If strong models do a good job of moral and accurate self-evaluation, that decreases the chance that the future AI will execute immoral / bad plans.
I expect AIs to do very well here because AIs will reliably pick up a lot of nice “values” from the training corpus. Empirically that seems to happen, and theoretically you’d get some of the way there from natural abstractions + “there are a few meaningful generalizations” + “if you train the AI to do thing X when you prompt it, it will do thing X when prompted.”
Intelligence is a “package deal” / tool AI won’t work well / intelligence comes in service of goals. There isn’t a way to take AIXI and lop off the “dangerous capabilities” part of the algorithm and then have an AI which can still do clever stuff on your behalf. It’s all part of the argmax, which holds both the promise and peril of these (unrealistic) AIXI agents. Is this true for LLMs?
But what if you just subtract a “sycophancy vector” and add a “truth vector” and maybe subtract a power-seeking vector? According to current empirical results, these modularly control those properties, with minimal apparent reduction in capabilities!
So I think the “intelligence is a package deal” philosophy isn’t holding up that great. (And we had an in-person conversation where I had predicted these steering vector results, and you had expected the opposite.)
The steering vectors were in fact derived using shard theory reasoning (about activating certain shards more or less strongly by adding a direction to the latent space). So this is a strong prediction of my models.
If intelligence isn’t a package deal, then tool AI becomes far more technically probable (but still maybe not commercially probable). This means we can maybe extract reasonably consequentialist reasoning with “deontological compulsions” against e.g. powerseeking, and have that make the AI agent not want to seek power.
There are certain training assumptions which are likely to be met by future systems but not present systems, by default and for all powerful systems we expect to know how to build build), the AI will develop internal goals which it pursues ~coherently across situations.[2] (This would be a knock against smart tool AI.)
Risks from Learned Optimization posited that a “simple” way to “do well in training” is to learn a unified goal and then a bunch of generalized machinery to achieve that goal. This model naturally predicts that when you train overparameterized networks on a wide range of tasks, then . Even if that network isn’t an AGI.
My MATS 3.0 team and I partially interpreted an overparameterized maze-solving network which was trained to convergence on a wide range of mazes. However, we didn’t find any “simple, unified” goal representation. In fact, it had redundant internal representations of the goal square! Due to how CNNs work, that should be literally meaningless!
That’s a misprediction of the “unified motivations are simple” frame; if we have the theoretical precision to describe the simplicity biases of unknown future systems, that model should crank out good predictions for modern systems too.
I’m happy to bet on any additional experiments related to the above.
There are probably a bunch of other things, and I might come back with more, but I’m getting tired of writing this comment. The main point is that common components of threat models regularly make meaningful mispredictions. My models often do better (though once I misread some data and strongly updated against my models, so I think I’m amenable to counterevidence here). Therefore, I’m able to refine my models of AGI risk. I certainly don’t think we’re in the dark and unable to find experimental evidence.
I expect you to basically disagree about future AI being a separate magisterium or something, but I don’t know why that’d be true.
Often the claimed causes of future doom imply models which make pre-takeoff predictions, as shown above (e.g. fragility of value). But even if your model doesn’t make pre-takeoff predictions… Since my model is unified for both present and future AI, I can keep gaining Bayes points and refining my model! This happens whether or not your model makes predictions here. This is useful insofar as the observations I’m updating on actually update me on mechanisms in my model which are relevant for AGI alignment.
If you think I just listed a bunch of irrelevant stuff, well… I guess I super disagree! But I’ll keep updating anyways.
The Emulated Finetuning paper found that GPT-4 is superhuman at grading helpfulness/harmlessness. In the cases of disagreements between GPT-4 and humans, a more careful analysis revealed that 80% of the time the disagreement was caused by errors in the human judgment, rather than GPT-4’s analysis.
I recently explained more of my skepticism of the coherent-inner-goal claim.
Another point is that I think GPT-4 straightforwardly implies that various naive supervision techniques work pretty well. Let me explain.
From the perspective of 2019, it was plausible to me that getting GPT-4-level behavioral alignment would have been pretty hard, and might have needed something like AI safety via debate or other proposals that people had at the time. The claim here is not that we would never reach GPT-4-level alignment abilities before the end, but rather that a lot of conceptual and empirical work would be needed in order to get models to:
Reliably perform tasks how I intended as opposed to what I literally asked for
Have negligible negative side effects on the world in the course of its operation
Responsibly handle unexpected ethical dilemmas in a way that is human-reasonable
Well, to the surprise of my 2019-self, it turns out that naive RLHF with a cautious supervisor designing the reward model seems basically sufficient to do all of these things in a reasonably adequate way. That doesn’t mean that RLHF scales all the way to superintelligence, but it’s very significant nonetheless and interesting that it scales as far as it does.
You might think “why does this matter? We know RLHF will break down at some point” but I think that’s missing the point. Suppose right now, you learned that RLHF scales reasonably well all the way to John von Neumann-level AI. Or, even more boldly, say, you learned it scaled to 20 IQ points past John von Neumann. 100 points? Are you saying you wouldn’t update even a little bit on that knowledge?
The point at which RLHF breaks down is enormously important to overall alignment difficulty. If it breaks down at some point before the human range, that would be terrible IMO. If it breaks down at some point past the human range, that would be great. To see why, consider that if RLHF breaks down at some point past the human range, that implies that we could build aligned human-level AIs, who could then help us align slighter smarter AIs!
If you’re not updating at all on observations about when RLHF breaks down, then you probably either (1) think it doesn’t matter when RLHF breaks down, or (2) you already knew in advance exactly when it would break down. I think position 1 is just straight-up unreasonable, and I’m highly skeptical of most people who claim position 2. This basic perspective is a large part of why I’m making such a fuss about how people should update on current observations.
What did you think would happen, exactly? I’m curious to learn what your 2019-self was thinking would happen, that didn’t happen.
On the other hand, it could be considered bad news that IDA/Debate/etc. haven’t been deployed yet, or even that RLHF is (at least apparently) working as well as it is. To quote a 2017 post by Paul Christiano (later reposted in 2018 and 2019):
It seems that AI labs are not yet actually holding themselves to producing scalable systems, and it may well be better if RLHF broke down in some obvious way before we reach potentially dangerous capabilities, to force them to do that.
(I’ve pointed Paul to this thread to get his own take, but haven’t gotten a response yet.)
ETA: I should also note that there is a lot of debate about whether IDA and Debate are actually scalable or not, so some could consider even deployment of IDA or Debate (or these techniques appearing to work well) to be bad news. I’ve tended to argue on the “they are too risky” side in the past, but am conflicted because maybe they are just the best that we can realistically hope for and at least an improvement over RLHF?
I think these methods are pretty clearly not indefinitely scalable, but they might be pretty scalable. E.g., perhaps scalable to somewhat smarter than human level AI. See the ELK report for more discussion on why these methods aren’t indefinitely scalable.
A while ago, I think Paul had maybe 50% that with simple-ish tweaks IDA could be literally indefinitely scalable. (I’m not aware of an online source for this, but I’m pretty confident this or something similar is true.) IMO, this seems very predictably wrong.
TBC, I don’t think we should necessarily care very much about whether a method is indefinitely scalable.
Sometimes people do seem to think that debate or IDA could be indefinitely scalable, but this just seems pretty wrong to me (what is your debate about alphafold going to look like...).
I think the first presentation of the argument that IDA/Debate aren’t indefinitely scalable was in Inaccessible Information, fwiw.
I agree that if RLHF scaled all the way to von neumann then we’d probably be fine. I agree that the point at which RLHF breaks down is enormously important to overall alignment difficulty.
I think if you had described to me in 2019 how GPT4 was trained, I would have correctly predicted its current qualitative behavior. I would not have said that it would do 1, 2, or 3 to a greater extent than it currently does.
I’m in neither category (1) or (2); it’s a false dichotomy.
The categories were conditioned on whether you’re “not updating at all on observations about when RLHF breaks down”. Assuming you are updating, then I think you’re not really the the type of person who I’m responding to in my original comment.
But if you’re not updating, or aren’t updating significantly, then perhaps you can predict now when you expect RLHF to “break down”? Is there some specific prediction that you would feel comfortable making at this time, such that we could look back on this conversation in 2-10 years and say “huh, he really knew broadly what would happen in the future, specifically re: when alignment would start getting hard”?
(The caveat here is that I’d be kind of disappointed by an answer like “RLHF will break down at superintelligence” since, well, yeah, duh. And that would not be very specific.)
I’m not updating significantly because things have gone basically exactly as I expected.
As for when RLHF will break down, two points:
(1) I’m not sure, but I expect it to happen for highly situationally aware, highly agentic opaque systems. Our current systems like GPT4 are opaque but not very agentic and their level of situational awareness is probably medium. (Also: This is not a special me-take. This is basically the standard take, no? I feel like this is what Risks from Learned Optimization predicts too.)
(2) When it breaks down I do not expect it to look like the failures you described—e.g. it stupidly carries out your requests to the letter and ignores their spirit, and thus makes a fool of itself and is generally thought to be a bad chatbot. Why would it fail in that way? That would be stupid. It’s not stupid.
(Related question: I’m pretty sure on r/chatgpt you can find examples of all three failures. They just don’t happen often enough, and visibly enough, to be a serious problem. Is this also your understanding? When you say these kinds of failures don’t happen, you mean they don’t happen frequently enough to make ChatGPT a bad chatbot?)
Re: Missing the point: How?
Re: Elaborating: Sure, happy to, but not sure where to begin. All of this has been explained before e.g. in Ajeya’s Training Game report for example. Also Joe Carlsmith’s thing. Also the original mesaoptimizers paper, though I guess it didn’t talk about situational awareness idk. Would you like me to say more about what situational awareness is, or what agency is, or why I think both of those together are big risk factors for RLHF breaking down?
I’ve been struggling with whether to upvote or downvote this comment btw. I think the point about how it’s really important when RLHF breaks down and more attention needs to be paid to this is great. But the other point about how RLHF hasn’t broke yet and this is evidence against the standard misalignment stories is very wrong IMO. For now I’ll neither upvote nor downvote.
From a technical perspective I’m not certain if Direct Preference Optimization is theoretically that much different from RLHF beyond being much quicker and lower friction at what it does, but so far it seems like it has some notable performance gains over RLHF in ways that might indicate a qualitative difference in effectiveness. Running a local model with a bit of light DPO training feels more intent-aligned compared to its non-DPO brethren in a pretty meaningful way. So I’d probably be considering also how DPO scales, at this point. If there is a big theoretical difference, it’s likely in not training a separate model, and removing whatever friction or loss of potential performance that causes.
What does this mean? I don’t know as much about CNNs as you—are you saying that their architecture allows for the reuse of internal representations, such that redundancy should never arise? Or are you saying that the goal square shouldn’t be representable by this architecture?
Thinking about this:
This is why I hate a lot of mathematical universe hypothesis/simulation hypothesis discourse, since they both predict anything, which is not a strength for these theories, even though I do think they’re true, they’re just too trivial as theories to work.
He didn’t say “anything can happen before AI explodes”. He said “I expect AI to look pretty great until it explodes.” And he didn’t say that his model about AGI safety generated that prediction; maybe his model about AGI safety generates some long-run predictions and then he’s using other models to make the “look pretty great” prediction.
There is a reference class judgement in this. If I have a theory of good moves in Go (and absently dabble in chess a little bit), while you have a great theory of chess, looking at some move in chess shouldn’t lead to a Bayes-update against ability of my theory to reason about Go. The scope of classical alignment worries is typically about the post-AGI situation. If it manages to say something uninformed about the pre-AGI situation, that’s something out of its natural scope, and shouldn’t be meaningful evidence either way.
I think the correct way of defeating classical alignment worries (about the post-AGI situation) is on priors, looking at the arguments themselves, not on observations where the theory doesn’t expect to have clear or good predictions (and empirically doesn’t). If the arguments appear weak, there is no recourse without observation of the post-AGI world, it remains weak at least until then. Even if it happened to have made good predictions about the current situation, it shouldn’t count in its favor.
Without commenting on how often people do or don’t bet, I think overall betting is great and I’d love to see more it!
I’m also excited how much of it I’ve seen since Manifold started gaining traction. So I’d like to give a shout out to LessWrong users who are active on Manifold, in particular on AI questions. Some I’ve seen are:
Rob Bensinger
Jonas Vollmer
Arthur Conmy
Jaime Sevilla Molina
Isaac King
Eliezer Yudkowsky
Noa Nabeshima
Mikhail Samin
Daniel Filan
Daniel Kokotajlo
Zvi
Eli Tyre
Ben Pace
Allison Duettmann
Matthew Barnett
Peter Barnett
Joe Brenton
Austin Chen
lc
Good job everyone for betting on your beliefs :)
There are definitely more folks than this: feel free to mention more folks in the comments who you want to give kudos to (though please don’t dox anyone who’s name on either platforms is pseudonymous and doesn’t match the other).
Others include:
Zack M. Davis
Ben Weinstein-Raun
1a3orn
Tetraspace
Jeremy Gillen
Thomas Kwa
Loppukilpailija
Niplav
Adele Dewey-Lopez
Nate Soares
Aella
Ozzie Gooen
Oliver Habryka
Here’s a couple of mine:
Yeah I mean the answer is, just make prediction markets and bet on them. I think we are getting a lot better at that.
(Also I’m a lesswrong user who makes a lot of prediction markets about AI)
In particular:
A real money version of Yud and Paul’s bet https://polymarket.com/event/will-an-ai-win-the-5-million-ai-math-olympiad-prize-before-august?tid=1702634083181
An attempt at clustering the best AI progress markets into a dashboard https://manifold.markets/dashboard/ai-progress
Yeah, I’m not really happy with the state of discourse on this matter either.
As a proponent of an AI-risk model that does this, I acknowledge that this is an issue, and I indeed feel pretty defensive on this point. Mainly because, as @habryka pointed out and as I’d outlined before, I think there are legitimate reasons to expect no blatant evidence until it’s too late, and indeed, that’s the whole reason AI risk is such a problem. As was repeatedly stated.
So all these moves to demand immediate well-operationalized bets read a bit like tactical social attacks that are being unintentionally launched by people who ought to know better, which are effectively exploiting the territory-level insidious nature of the problem to undermine attempts to combat it, by painting the people pointing out the problem as blind believers. Like challenges that you’re set up to lose if you take them on, but which make you look bad if you turn them down.
And the above, of course, may read exactly like a defense attempt a particularly self-aware blind believer might construct. Which doesn’t inspire much self-doubt in me[1], but it does make me feel like I’m– no, not like I’m sailing against the winds of counterevidence – like I’m playing the social game on the side that’s poised to lose it in the long run, so I should switch up to the winning side to maximize my status, even if its position is wrong.
I’m somewhat hopeful about navigating to some concrete empirical or mathematical evidence within the next couple years. But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
(Edit, because I’m concerned I’d been too subtle there: I am not accusing anyone, and especially not @TurnTrout, of deliberately employing social tactics to undermine their opponents rather than cooperatively seeking the truth. I’m only saying that the (usually extremely reasonable) requests for well-operationalized bets effectively have this result in this particular case.
Neither am I suggesting that the position I’m defending should be immune to criticism. Empirical evidence easily tied to well-operationalized bets is usually an excellent way to resolve disagreements and establish truth. But it’s not the only one, and it just so happens that this specific position can’t field many good predictions in this field.)
“But of course it won’t,” you might think – which, fair enough. But what’s your policy for handling problems that really are this insidious?
Your post defending the least forgiving take on alignment basically relies on a sharp/binary property of AGI, and IMO a pretty large crux is that either this property probably doesn’t exist, or if it does exist, it is not universal, and IMO I think tends to be overused.
To be clear, I’m increasingly agreeing with a weak version of the hypothesis, and I also think you are somewhat correct, but IMO I dont think your stronger hypothesis is correct, and I think that the lesson of AI progress is that it’s less sharp the more tasks you want, and the more general intelligence you want, which is in opposition to your hypothesis on AI progress being sharp.
I actually kinda agree with you here, but unfortunately, this is very, very important, since your allies are trying to gain real-life political power over AI, and given this is extremely impactful, it is basically required for us to discuss it.
There’s a bit of “one man’s modus ponens is another’s modus tollens” going on. I assume that when you look at a new AI model, and see how it’s not doing instrumental convergence/value reflection/whatever, you interpret it as evidence against “canonical” alignment views. I interpret it as evidence that it’s not AGI yet; or sometimes, even evidence that this whole line of research isn’t AGI-complete.
E. g., I’ve updated all the way on this in the case of LLMs. I think you can scale them a thousandfold, and it won’t give you AGI. I’m mostly in favour of doing that, too, or at least fully realizing the potential of the products already developed. Probably same for Gemini and Q*. Cool tech. (Well, there are totalitarianism concerns, I suppose.)
I also basically agree with all the takes in the recent “AI is easy to control” post. But what I take from it isn’t “AI is safe”, it’s “the current training methods aren’t gonna give you AGI”. Because if you put a human – the only known type of entity with the kinds of cognitive capabilities we’re worrying about – into a situation isomorphic to a DL AI’s, the human would exhibit all the issues we’re worrying about.
Like, just because something has a label of “AI” and is technically an AI doesn’t mean studying it can give you lessons about “AGI”, the scary lightcone-eating thing all the fuss is about, yeah? Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work?
And that the Deep Learning paradigm can probably scale to AGI doesn’t mean that studying the intermediary artefacts it’s currently producing can teach us much about the AGI it’ll eventually spit out. Any more than studying a MNIST-classifier CNN can teach you much about LLMs; any more than studying squirrel neurology can teach you much about winning moral-philosophy debates.
That’s basically where I’m at. LLMs and such stuff is just in the entirely wrong reference class for studying “generally intelligent”/scary systems.
No, but my point here is that once we increase the complexity of the domain, and require more tasks to be done, things start to smooth over, and we don’t have nearly as sharp.
I suspect a big part of that is the effects of Amdahl’s law kicking in combined with Baumol’s cost disease and power law scaling, which means you are always bottlenecked on the least automatable and doable tasks, so improvements in one area like Go don’t exactly matter as much as you think.
I’d say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors, by a wide margin, and these grow smoothly. Only now are algorithms starting to play a role, and even then, it’s only because of the fact that transformers turn out to be fairly terrible at generalizing or doing stuff, which is related to your claim about LLMs being not real AGI, but I think this effect is weaker than you think, and I’m sympathetic to the continuous view as well. There probably will be some discontinuities, but IMO LWers have fairly drastically overstated how discontinuous progress was, especially if we realize that a lot of the outliers were likely simpler than the real world (Though Go comes close to it, at least for it’s domain, the problem is that the domain is far too small to matter.)
I think this roughly tracks how we updated, though there was a brief phase where I became more pessimistic as I learned that LLMs probably wasn’t going to scale to AGI, and broke a few of my alignment plans, but I found other reasons to be more optimistic that didn’t depend on LLMs nearly as much.
My worry is that while I think it’s fine enough to update towards “it’s not going to have any impact on anything, and that’s the reason it’s safe.” I worry that this is basically defining away the possibility of safety, and thus making the model useless:
I think a potential crux here is whether to expect some continuity at all, or whether there is reason to expect a discontinuous step change for AI, which is captured in this post: https://www.lesswrong.com/posts/cHJxSJ4jBmBRGtbaE/continuity-assumptions
I basically disagree entirely with that, and I’m extremely surprised you claimed that. If we grant that we get the same circumstances to control humans as we can do for DL AIs, then alignment becomes basically trivial in my view, since human control research would have way better ability to study humans, and in particular there is no IRB/FDA or regulation to control you, which would be huge changes to how science basically works today. It may take a lot of brute force work, but I think it basically becomes trivial to align human beings if humans could be put into a situation isomorphic to DL AIs.
As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure.
As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Jury’s still out.
Like, I think my update on all the LLM stuff is “boy, who knew interpolation can get you this far?”. The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force.
Oh, I didn’t mean “if we could hook up a flesh-and-blood human (or a human upload) to the same sort of cognition-shaping setup as we subject our AIs to”. I meant “if the forward-pass of an LLM secretly simulated a human tasked with figuring out what token to output next”, but without the ML researchers being aware that it’s what’s going on, and with them still interacting with the thing as with a token-predictor. It’s a more literal interpretation of the thing sometimes called an “inner homunculus”.
I’m well aware that the LLM training procedure is never going to result in that. I’m just saying that if it did, and if the inner homunculus became smart enough, that’d cause all the deceptive-alignment/inner-misalignment/wrapper-mind issues. And that if you’re not modeling the AI as being/having a homunculus, you’re not thinking about an AGI, so it’s no wonder the canonical AI-risk arguments fail for that system and it’s no wonder it’s basically safe.
I’d say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify.
I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms, ala Eliezer, and compute is not negligible. Epoch estimates a 50⁄50 split between compute and algorithmic progress being important. Algorithmic progress will likely matter IMO, just not nearly as much as some LWers think it is.
I definitely updated something in this direction, which is important, but I now think the AI optimist arguments are general enough to not rely on LLMs, and sometimes not even relying on a model of what future AI will look like beyond the fact that capabilities will grow, and people expect to profit from it.
Not automatically, and there are potential paths to AGI like Steven Byrnes’s path to Brain-like AGI that either outright avoid deceptive alignment altogether or make it far easier to solve (the short answer is that Steven Byrnes suspects there’s a simple generator of value, so simple that it’s dozens of lines long and if that’s the case, then the corrigible alignment/value learning agent’s simplicity gap is either 0, negative, or a very small positive gap, so small that very little data is required to pick out the honest value learning agent over the deceptive aligned agent, and we have a lot of data on human values, so this is likely to be pretty easy.)
I think a crux is that I think that AIs will basically always have much more white-boxness to them than any human mind, and I think that a lot of future paradigms of AI, including the ones that scale to superintelligence, that the AI control research is easier point to still mostly be true, especially since I think AI control is fundamentally very profitable and AIs have no legal rights/IRB boards to slow down control research.
Mm, I think the “algorithms vs. compute” distinction here doesn’t quite cleave reality at its joints. Much as I talked about interpolation before, it’s a pretty abstract kind of interpolation: LLMs don’t literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points they’ve been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms.
It’s dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to “push” your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it won’t match a modern LLM’s performance, won’t even come close.)
So algorithms do matter. It’s just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.
So that’s where, I would argue, the sharp left turn would lie. Not in-training, when a model’s loss suddenly drops as it “groks” general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, that’s basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The “main” sharp left turn happens during the architecture search, not during the training.)
And I’m reasonably sure we’re in an agency overhang, meaning that the newborn GI would pass human intelligence in an eye-blink. (And if it won’t, it’ll likely stall at incredibly unimpressive sub-human levels, so the ML researchers will keep tinkering with the training setups until finding one that does send it over the edge. And there’s no reason whatsoever to expect it to stall again at the human level, instead of way overshooting it.)
Which human’s values? IMO, “the AI will fall into the basin of human values” is kind of a weird reassurance, given the sheer diversity of human values – diversity that very much includes xenophobia, genocide, and petty vengeance scaled up to geopolitical scales. And stuff like RLHF designed to fit the aesthetics of modern corporations doesn’t result in deeply thoughtful cosmopolitan philosophers – it results in sycophants concerned with PR as much as with human lives, and sometimes (presumably when not properly adapted to a new model’s scale) in high-strung yanderes.
Let’s grant the premise that the AGI’s values will be restricted to the human range (which I don’t really buy). If the quality of the sample within the human range that we pick will be as good as what GPT-4/Sydney’s masks appeared to be? Yeah, I don’t expect humans to stick around for a while after.
Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tiny—something like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other ‘secret sauce’ and we now have some confidence in a null result: no secret sauce, just much more training compute.
Yes, but: whales and elephants have brains several times the size of humans, and they’re yet to build an industrial civilization. I agree that hitting upon the right architecture isn’t sufficient, you also need to scale it up – but scale alone doesn’t suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms.
Evolution stumbled upon the human/primate template brain. One of the forks of that template somehow “took off” in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization.
The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. They’ll then scale it up as far as it’d go, as they’re wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics.
And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman.
(Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. There’s likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
Size/capacity isn’t all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked.
Human language/culture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/whales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/expensive because it was valuable to do so. Evolution didn’t suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissue—something had to go), and yes larger brains, etc.
Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/culture. This is not entirely unprecedented—insects are also social organisms of course, but their tiny brains aren’t large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size.
You can separate intelligence into world model knowledge (crystal intelligence) and search/planning/creativity (fluid intelligence). Humans are absolutely not special in our fluid intelligence—it is just what you’d expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural. Just as each cell can store the DNA knowledge of the entire organism, each human mind ‘cell’ can store a compressed version of much of human knowledge and gains the benefits thereof.
The cultural metasystems transition which is solely completely responsible for our intellectual capability is a one time qualitative shift that will never reoccur. AI will not undergo the same transition, that isn’t how these work. The main advantage of digital minds is just speed, and to a lesser extent, copying.
We’ve basically known how to create AGI for at least a decade. AIXI outlines the 3 main components: a predictive world model, a planning engine, and a critic. The brain also clearly has these 3 main components, and even somewhat cleanly separated into modules—that’s been clear for a while.
Transformers LLMs are pretty much exactly the type of generic minimal ULM arch I was pointing at in that post (I obviously couldn’t predict the name but). On a compute scaling basis GPT4 training at 1e25 flops uses perhaps a bit more than human brain training, and its clearly not quite AGI—but mainly because it’s mostly just a world model with a bit of critic: planning is still missing. But its capabilities are reasonably impressive given that the architecture is more constrained than a hypothetical more directly brain equivalent fast-weight RNN of similar size.
Anyway I don’t quite agree with the characterization that these models are just ” interpolating valid completions of any arbitrary prompt sampled from the distribution”. Human intelligence also varies widely on a spectrum with tradeoffs between memorization and creativity. Current LLMs mostly aren’t as creative as the more creative humans and are more impressive in breadth of knowledge, but eh part of that could be simply that they currently completely lack the component essential for creativity? That they accomplish so much without planning/search is impressive.
Interestingly that is closer to my position and I thought that Byrnes thought the generator of value was somewhat more complex, although are views are admittedly fairly similar in general.
This paragraph doesn’t seem like an honest summary to me. Eliezer’s position in the dialogue, as I understood it, was:
The journey is a lot harder to predict than the destination. Cf. “it’s easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be”. Eliezer isn’t claiming to have secret insights about the detailed year-to-year or month-to-month changes in the field; if he thought that, he’d have been making those near-term tech predictions already back in 2010, 2015, or 2020 to show that he has this skill.
From Eliezer’s perspective, Paul is claiming to know a lot about the future trajectory of AI, and not just about the endpoints: Paul thinks progress will be relatively smooth and continuous, and thinks it will get increasingly smooth and continuous as time passes and more resources flow into the field. Eliezer, by contrast, expects the field to get choppier as time passes and we get closer to ASI.
A way to bet on this, which Eliezer repeatedly proposed but wasn’t able to get Paul to do very much, would be for Paul to list out a bunch of concrete predictions that Paul sees as “yep, this is what smooth and continuous progress looks like”. Then, even though Eliezer doesn’t necessarily have a concrete “nope, the future will go like X instead of Y” prediction, he’d be willing to bet against a portfolio of Paul-predictions: when you expect the future to be more unpredictable, you’re willing to at least weakly bet against any sufficiently ambitious pool of concrete predictions.
(Also, if Paul generated a ton of predictions like that, an occasional prediction might indeed make Eliezer go “oh wait, I do have a strong prediction on that question in particular; I didn’t realize this was one of our points of disagreement”. I don’t think this is where most of the action is, but it’s at least a nice side-effect of the person-who-thinks-this-tech-is-way-more-predictable spelling out predictions.)
Eliezer was also more interested in trying to reach mutual understanding of the views on offer, as opposed to bet let’s bet on things immediately never mind the world-views. But insofar as Paul really wanted to have the bets conversation instead, Eliezer sunk an awful lot of time into trying to find operationalizations Paul and he could bet on, over many hours of conversation.
If your end-point take-away from that (even after actual bets were in fact made, and tons of different high-level predictions were sketched out) is “wow how dare Eliezer be so unwilling to make bets on anything”, then I feel a lot less hope that world-models like Eliezer’s (“long-term outcome is more predictable than the detailed year-by-year tech pathway”) are going to be given a remotely fair hearing.
(Also, in fairness to Paul, I’d say that he spent a bunch of time working with Eliezer to try to understand the basic methodologies and foundations for their perspectives on the world. I think both Eliezer and Paul did an admirable job going back and forth between the thing Paul wanted to focus on and the thing Eliezer wanted to focus on, letting us look at a bunch of different parts of the elephant. And I don’t think it was unhelpful for Paul to try to identify operationalizations and bets, as part of the larger discussion; I just disagree with TurnTrout’s summary of what happened.)
Your comments’ points seem like further evidence for my position. That said, your comment appears to serve the function of complicating the conversation, and that happens to have the consequence of diffusing the impact of my point. I do not allege that you are doing so on purpose, but I think it’s important to notice. I would have been more convinced by a reply of “no, you’re wrong, here’s the concrete bet(s) EY made or was willing to make but Paul balked.”
I will here repeat a quote[1] which seems relevant:
First of all, I disagree with the first claim and am irritated that you stated it as a fact instead of saying “I think that...”. My overall take-away from this paragraph, as pertaining to my point, is that you’re pointing out that Eliezer doesn’t make predictions because he can’t / doesn’t have epistemic alpha. That accords with my point of “EY was unwilling to bet.”
My takeaway, as it relates to my quoted point: Either Eliezer’s view makes no near-term falsifiable predictions which differ from the obvious ones, or only makes meta-predictions which are hard to bet on. Sounds to my ears like his models of alignment don’t actually constrain his moment-to-moment anticipations, in contrast to my own, which once destroyed my belief in shard theory on a dime (until I realized I’d flipped the data, and undid the update). This perception of “the emperor has no constrained anticipations” I have is a large part of what I am criticizing.
So Eliezer offered Paul the opportunity for Paul to unilaterally stick his neck out on a range of concrete predictions, so that Eliezer could judge Paul’s overall predictive performance against some unknown and subjective baseline which Eliezer has in his head, or perhaps against some group of “control” predictors? That sounds like the opposite of “willing to make concrete predictions” and feeds into my point about Paul not being able to get Eliezer to bet.
Edit: If there was a more formal proposal which actually cashes out into resolution criteria and Brier score updates for both of them, then I’m happier with EY’s stance but still largely unmoved; see my previous comment above about the emperor.
This paragraph appears to make two points. First, Eliezer was less interested in betting than in having long dialogues. I agree. Second, Eliezer spent a lot of time at least appearing as if he were trying to bet. I agree with that as well. But I don’t give points for “trying” here.
Giving points for “trying” is in practice “giving points for appearing to try”, as is evident from the literature on specification gaming. Giving points for “appeared to try” opens up the community to invasion by bad actors which gish gallop their interlocutors into giving up the conversation. Prediction is what counts.
Nitpick, but that’s not a “world-model.” That’s a prediction.
Why write this without citing? Please cite and show me the credences and the resolution conditions.
If anyone entering this thread wishes to read the original dialogue for themselves, please see section 10.3 of https://www.lesswrong.com/posts/fS7Zdj2e2xMqE6qja/more-christiano-cotra-and-yudkowsky-on-ai-progress
Thanks for you feedback. I certainly appreciate your articles and I share many of your views. Reading what you had to say, along with Quentin, Jacob Cannell, Nora was a very welcome alternative take that expanded my thinking and changed my mind. I have changed my mind a lot over the last year, from thinking AI was a long way off and Yud/Bostrom were basically right to seeing that its a lot closer and theories without data are almost always wrong in may ways—e.g. SUSY was expected to be true for decades by most of the world’s smartest physicists. Many alignment ideas before GPT3.5 are either sufficiently wrong or irrelevant to do more harm than good.
Especially I think the over dependence on analogy, evolution. Sure when we had nothing to go on it was a start, but when data comes in, ideas based on analogies should be gone pretty fast if they disagree with hard data.
(Some background—I read the site for over 10 years have followed AI for my entire career, have an understanding of Maths, Psychology, and have built and deployed a very small NN model commercially. Also as an aside I remember distinctly being surprised that Yud was skeptical of NN/DL in the earlier days when I considered it obviously where AI progress would come from—I don’t have references because I didn’t think that would be disputed afterwards)
I am not sure what the silent majority belief on this site is (by people not Karma)? Is Yud’s worldview basically right or wrong?
analogies based on evolution should be applied at the evolutionary scale: between competing organizations.
Well they definitely can be applied there—though perhaps its a stage further than analogy and direct application of theory? Then of course data can agree/disagree.
gradient descent is not evolution and does not behave like evolution. it may still have problems one can imagine evolution having, but you can’t assume facts about evolution generalize—it’s in fact quite different.
I really don’t want to go down a rabbit hole here, so probably won’t engage in further discussion, but I just want to chime in here and say that I’m pretty sure lots of the world’s smartest physicists (not sure what fraction) still expect the fundamental laws of physics in our universe to have (broken) supersymmetry, and I would go further and say that they have numerous very good reasons to expect that, like gauge coupling unification etc. Same as ever. The fact that supersymmetric partners were not found at LHC is nonzero evidence against supersymmetric partners existing, but it’s not strong evidence against them existing, because LHC was very very far from searching the whole space of possibilities. Also, we pretty much know for a fact that the universe contains at least one other yet-to-be-discovered elementary particle beyond the 17 (or whatever, depends on how you count) particles in the Standard Model. So I think it’s extremely premature to imply that the prediction of yet-to-be-discovered supersymmetric partner particles has been ruled out in our universe and haha look at those overconfident theoretical physicists. (A number of specific SUSY-involving theories have been ruled out, but I think the smart physicists knew all along that those were just plausible hypotheses worth checking, not confident theoretical predictions.)
OK you are answering at a level more detailed than I raised and seem to assume I didn’t consider such things. My reason and IMO the expected reading of “SUSY has failed” is not that such particles have been ruled out as I know they havn’t, but that its theoretical benefits are severely weakened or entirely ruled out according to recent data. My reference to SUSY was specifically regarding its opportunity to solve the Hierarchy Problem. This is the common understanding of one of the reasons it was proposed.
I stand by my claim that many/most of the top physicists expected for >1 decade that it would help solve such a problem. I disagree with the claim:
“but I think the smart physicists knew all along that those were just plausible hypotheses worth checking, ” Smart physicists thought SUSY would solve the hierarchy problem.
----
Common knowledge, from GPT4:
“can SUSY still solve the Hierarchy problem with respect to recent results”
Hierarchy Problem: SUSY has been considered a leading solution to the hierarchy problem because it naturally cancels out the large quantum corrections that would drive the Higgs boson mass to a very high value. However, the non-observation of supersymmetric particles at expected energy levels has led some physicists to question whether SUSY can solve the hierarchy problem in its simplest forms.
Fine-Tuning: The absence of low-energy supersymmetry implies a need for fine-tuning in the theory, which contradicts one of the primary motivations for SUSY as a solution to the hierarchy problem. This has led to exploration of more complex SUSY models, such as those with split or high-scale supersymmetry, where SUSY particles exist at much higher energy scales.
----
IMO ever more complex models rapidly become like epi-cycles.
I think this will depend strongly on where you draw the line on “basically”. I think the majority probably thinks:
AI is likely to be a really big deal
Existential risk from AI is at least substantial (e.g. >5%)
AI takeoff is reasonably likely to happen quite quickly in wall clock time if this isn’t actively prevented (e.g. AI will cause there to be <10 years from a 20% annualized GDP growth rate to a 100x annualized growth rate)
The power of full technological maturity is extremely high (e.g. nanotech, highly efficient computing, etc.)
But, I expect that the majority of people don’t think:
Inside view, existential risk is >95%
A century of dedicated research on alignment (targeted as well as society would realistically do) is insufficient to get risk <15%.
Which I think are both beliefs Yudkowsky has.
For me -
Yes to AI being a big deal and extremely powerful ( yes I doubt anyone would be here otherwise)
Yes—Don’t think anyone can reasonably claim its <5% but then so is not having AI if x-risk is defined to be humanity missing practically all of its Cosmic endowment.
Maybe—Even with slow takeoff, and hardware constrained you get much greater GDP, though I don’t agree with 100x (for the critical period that is, 100x could happen later). E.g. car factories are made to produce robots, we get 1-10 billion more minds and bodies per year, but not quite 100X. ~10x per year is enough to be extremely disruptive and x-risk anyway.
---
(1)
Yes I don’t think x-risk is >95% - say 20% as a very rough guess that humanity misses all its Cosmic endowment. I think AI x-risk needs to be put in this context—say you ask someone
“What’s the chance that humanity becomes successfully interstellar?”
If they say 50⁄50 then being OK with any AI x-risk less than 50% is quite defensible if getting AI right means that its practically certain you get your cosmic endowment etc.
---
(2)
I do think its defensible that a century of dedicated research on alignment doesn’t get risk <15% but because alignment research is only useful a little bit in advance of capabilities—say we had a 100 year pause, then I wouldn’t have confidence in our alignment plan at the end of it.
Anyway regarding x-risk I don’t think there is a completely safe path. Too fast with AI and obvious risk, too slow and there is also other obvious risks. Our current situation is likely unstable. For example the famous quote
“If you want a picture of the future, imagine a boot stamping on a human face— forever.”
I believe that is now possible with current tech, where it was not say for Soviet Russia. So we may be in the situation where societies can go 1984 totalitarian bad, but not come back because our tech coordination skills are sufficient to stop centralized empires from collapsing. LLM of course make censorship even easier. (I am sure there are other ways our current tech could destroy most societies also)
If that’s the case, a long pause could result in all power being in such societies which when the pause ended would be very likely to screw up alignment.
That makes me unsure what regulation to advocate for, though I am in favor of slowing down hardware AI progress but fully exploring the capabilities of our current HW.
Most importantly I think we should hugely speed up Neuralink type devices and brain uploading. I would identify much more with an uploaded human that was then carefully, appropriately upgraded to superintelligence than an alternative path where a pure AI superintelligence was made.
We have to accept that we live in critical times and just slowing things down is not necessarily the safest option.
Yup, and this is why I’m more excited to supervise MATS mentees who haven’t read The Sequences.
Hi there.
> (High confidence) I feel like the project of thinking more clearly has largely fallen by the wayside, and that we never did that great of a job at it anyways.
I’m new to this community. I’ve skimmed quite a few articles, and this sentence resonates with me for several reasons.
1) It’s very difficult in general to find websites like LessWrong these days. And among the few that exist, I’ve found that the intellectuals on them are so incredibly doubtful of their own intellect. This creates a sort of Ouroboros phenomenon where intellects just eat themselves into oblivion. Like, maybe I’m wrong but this site’s popularity seems to be going down?
2) At least from what I’ve noticed, when I compare articles in the last 2 months, to ones from about a decade ago, there is an alarming truth in your sentence. A decade ago, there were questions left in the articles for commenters to answer, there was a willingness to change one’s mind and to add/enhance ideas in a good faith manner. Now, it seems that many have confused this website for LinkedIn, posting their own personal paper trails (which is largely in a tone that isn’t unique anyways.)
It’s really unfortunate, since I was excited upon being greeted with much older articles. And then realising “Oh… that was from… holy! 10 years ago!?” To then be disappointed by our articles from today.
I think it’s fine for there to be a status hierarchy surrounding “good alignment research”. It’s obviously bad if that becomes mismatched with reality, as it almost certainly is to some degree, but I think people getting prestige for making useful progress is essentially what happens for it to be done at all.
If we aren’t good at assessing alignment research, there’s the risk that people substitute the goal of “doing good alignment research” with “doing research that’s recognized as good alignment research”. This could lead to a feedback loop where a particular notion of “good research” gets entrenched: Research is considered good if high status researchers think it’s good; the way to become a high status researcher is to do research which is considered good by the current definition, and have beliefs that conform with those of high status researchers.
A number of TurnTrout’s points were related to this (emphasis mine):
I’d like to see more competitions related to alignment research. I think it would help keep assessors honest if they were e.g. looking at 2 anonymized alignment proposals, trying to compare them on a point-by-point basis, figuring out which proposal has a better story for each possible safety problem. If competition winners subsequently become high status, that could bring more honesty to the entire ecosystem. Teach people to focus on merit rather than politics.
I think that might be a result of how the topic is, well, just really fucking grim. I think part of what allows discussion of it and thought about it for a lot of people (including myself) is a certain amount of detachment. “AI doomers” get often accused of being LARPers or not taking their own ideas seriously because they don’t act like people who believe the world is ending in 10 years, but I’d flip that around—a person who believes the world is ending in 10 years probably acts absolutely insane, and so people to keep their maximum possible sanity establish a sort of barrier and discuss these things as they would a game or a really interesting scientific question. But actually placing a bet on it? Shorting your own future on the premise that you won’t have a future? That breaks the barrier, and it becomes just really uncomfortable. I know I’d still rather live as if I was dead wrong no matter how confident I am in being theoretically right. I wonder in fact whether this feeling was shared by e.g. game theorists working on nuclear strategy.
I think there are some great points in this comment but I think it’s overly negative about the LessWrong community. Sure, maybe there is a vocal and influential minority of individuals who are not receptive to or appreciative of your work and related work. But I think a better measure of the overall community’s culture than opinions or personal interactions is upvotes and downvotes which are much more frequent and cheap actions and therefore more representative. For example, your posts such as Reward is not the optimization target have received hundreds of upvotes, so apparently they are positively received.
LessWrong these days is huge with probably over 100,000 monthly readers so I think it’s challenging to summarize its culture in any particularly way (e.g. probably most users on LessWrong live outside the bay area and maybe even outside the US). I personally find that LessWrong as a whole is fairly meritocratic and not that dogmatic, and that a wide variety of views are supported provided that they are sufficiently well-argued.
In addition to LessWrong, I use some other related sites such as Twitter, Reddit, and Hacker News and although there may be problems with the discourse on LessWrong, I think it’s generally significantly worse on these other sites. Even today, I’m sure you can find people saying things on Twitter about how AIs can’t have goals or that wanting paperclips is stupid. These kinds of comments wouldn’t be tolerated on LessWrong because they’re ignorant and a waste of time. Human nature can be prone to ignorance, rigidness of opinions and so on but I think the LessWrong walled garden has been able to counteract these negative tendencies better than most other sites.
No disagreement here that this place does this. I also think we should attempt to change many of these things. However, I don’t expect the lesswrong team to do anything sufficiently drastic to counter the hero-worship. Perhaps they could consider hiding usernames by default, hiding vote counts until things have been around for some period of time, or etc.
Hmm, my sense is Eliezer very rarely comments, and the people who do comment a lot don’t have a ton of hero worship going on (like maybe Wentworth?). So I don’t super believe that hiding usernames would do much about this.
Agree, and my guess is that the hero worship, to the extent it happens, is caused by something like
for Eliezer: people finding the rationality community and observing that they were less crazy than most other communities about various things, and Eliezer was a very prolific and persuasive writer
for Paul: Paucity of empirical alignment work before 2021 meant that Paul was one of the few people with formal CS experience and good alignment ideas, and had good name recognition due to posting on LW
Both of these seem to be solving themselves.
I think one of the issues with Eliezer is that he sees himself as a hero, and it comes through both explicitly and in vibes in the writing, and Eliezer is also a persuasive writer.
What is wrong with seeing oneself as a hero?
Nothing wrong with it, in fact I recommend it. But seeing oneself as a hero and persuading others of it will indeed be one of the main issues leading to hero worship.
how would you operationalize a bet on this? I’d take “yes” on “will hiding usernames by default decrease hero worship on lesswrong” on manifold, if you want to do an AB test or something.
Hacker News shows you the vote counts on your comments privately. I think that’s a significant improvement. It nudges people towards thinking for themselves rather than trying to figure out where the herd is going. At least, I think it does, because HN seems to have remarkable viewpoint diversity compared with other forums.
Would you mind sharing some specifiexamples? (Not of people of but of beliefs)
LessWrong.com is my favorite website. I’ve tried having thoughts on other websites and it didn’t work. Seriously, though—I feel very grateful for the effort you all have put in to making this an epistemically sane environment. I have personally benefited a huge amount from the intellectual output of LW—I feel smarter, saner, and more capable of positively affecting the world, not to mention all of the gears-level knowledge I’ve learned, and model building I’ve done as a result, which has really been a lot of fun :) And when I think about what the world would look like without LessWrong.com I mostly just shudder and then regret thinking of such dismal worlds.
Some other thoughts of varying import:
I dislike emojis. They feel like visual clutter to me. I also feel somewhat assaulted when I read through comments sometimes, as people’s opinions jump out at me before I’ve had much chance to form my own.
I like dialogues a lot more than I was expecting. What I expected was something like “people will spend a bunch of time talking past each other in their own mentalese with little effort towards making the reader capable of understanding and it’ll feel cluttered and way too long and hard to make much sense of.” I think this does sometimes happen. But I’ve also been pleasantly surprised by the upsides which I was not anticipating—seeing more surface area on people’s thoughts which helps me make sense of their “deal” in a way that’s useful for modeling their other views, (relatedly) getting a better sense of how people generate thoughts, where their intuitions are coming from, and so on. It also makes LW feel more homey, in my opinion.
If there were one dial I’d want to experiment with turning on LW it would be writing quality, in the direction of more of it. I don’t feel like I have super great ideas on how to cultivate this, but I’ll just relay the sort of experience that makes me say this. Sometimes I want to understand something someone has said. I think “ah, they probably said that there,” and then I go to a post, skim it, find the sort-of-related thing but it’s not quite right (they talk around the point without really saying it, or it’s not very clear, etc). But they link to ten other posts of theirs, all promising to tell me the thing I think they said, so I follow those links, but they’re also a bit slippery in the same ways. And I feel like I go in circles trying to pin down exactly what the claims are, never quite succeeding, until I feel like throwing up my hands in defeat. To some extent this seems like just par for the course with highly intellectually productive people—ideas outpace idea management and legibility, and in the absence of having the sort of streamlined clarity that I’m more used to seeing in, e.g., books, I would on the margin prefer they still publish. But I do think this sort of thing can make it harder to push the frontier of human knowledge together, and if I did have a dial I could turn to make writing quality better (clearer, more succinct, more linear, etc.), even at the expense of somewhat fewer posts, I’d at least want to try that for a bit.
Something has long bothered me about how people talk about “p(doom)” around here. Like, here’s an experience I have regularly: I tell someone I am hoping to take an action in the future, they say “haha, what future? we’ll all be dead by then!” I really dislike this, not because I don’t agree that we’re facing serious risks, or that it’s never okay to joke about that, but more that I often don’t believe them. It seems to me that in many conversations high p(doom) is closer to type “meme” than “belief,” like a badge people wear to fit into the social fabric.
But also, it feeds into this general vibe of nihilistic hopelessness that the Bay Area rationality scene has lapsed into, according to me, which I worry stems in part from deferring to Eliezer’s/Nate’s hopelessness. And I don’t know, if you really are on-model hopeless I guess that’s all well and good, but on a gut level I just don’t really buy that this makes sense. Alignment seems like a hard science problem but not an impossible one, and I think that if we actually try, we may very well have a good shot at figuring it out. But at present it feels to me like so few people are trying to solve the hard parts of the problem—that so much work has gone meta (e.g., community building, power for the sake of power, deferring the “solving it” part to uploads or AI); that even though people concede there’s some chance things go well, that in their gut they basically just have some vague sense of “we’re fucked” which inhibits them from actually trying; that somehow our focus has become about managing tenth order effects of the social graph, the “well what if this faction does this, then people will update this way and then we’ll lose influence over there”… I don’t know, it just sort of feels like we’ve communally lost the spirit of something that seems really powerful to me—something that I took away from the Sequences—a sense of agency, ambition, truth-seeking, and integrity in the face of hard problems. A sense that we can… solve this. Like actually solve the actual problem! I would like that spirit back.
I’m not sure how to get it, exactly, and I don’t know that this is aimed at the LW team in particular rather than being nebulously aimed at “Bay Area rats” or something. But just to add one small piece that I think LW could work on: I’ve occasionally seen the mods slip from “I think we are doomed” language to “we’re doomed” language. I’ve considered bringing it up although for any particular instance it feels a bit too aggressive relative to the slight, and because I get that it’s annoying to append your epistemic state to everything, and so on. But I do think that on this topic in particular it’s good to be careful, as it’s one of the most crazy-making aspects of this situation, and one that seems especially easy to spiral into group-think-y/deferral-y dynamics about.
I feel sad about ending on a bad note, mostly because I feel sad that so many people seem to be dunking on MIRI/rationality/LW lately. And I have some kind of “can we please not throw the baby out with the bathwater” sense. I certainly have some gripes with the community, but on net I am really happy that it exists. And I continue to believe that the spirit of rationality is worth fighting for—both because it’s beautiful for its own sake, but also because I believe in its ability to positively shape our lightcone. I see LW as part of that mission, and I feel deeply grateful for it.
I’d like to highlight this. In general, I think fewer things should be promoted to the front page.
[edit, several days later]: https://www.lesswrong.com/posts/SiPX84DAeNKGZEfr5/do-websites-and-apps-actually-generally-get-worse-after is a prime example. This has nothing to do with rationality or AI alignment. This is the sort of off-topic chatter that belongs somewhere else on the Internet.
[edit, almost a year later]: https://www.lesswrong.com/posts/dfKTbyzQSrpcWnxfC/2025-color-trends is an even better example of off-topic cross-posting that the author should not be rewarded for doing.
I’m a huge fan of agree/disagree voting. I think it’s an excellent example of a social media feature that nudges users towards truth, and I’d be excited to see more features like it.
I also enjoy the reacts way more than I expected! They feel aesthetically at home here, especially with reacts for specific parts of the text.
I think the reacts being semantic instead of being random emojis is what makes this so much better.
I wish other platforms experimented with semantic reacts as well, instead of just letting people react with any emoji of their choosing, and making you guess whether e.g. “thumbs up” means agreement, acknowledgement, or endorsement, etc.
It seems like it would be useful to have it for top-level posts. I love disagree voting and there are massive disparities sometimes between upvotes and agreements that show how useful it is in surfacing good arguments that are controversial.
I think I’m seeing some high effort, topical and well-researched top-level posts die on the vine because of controversial takes that are probably disagree voting. This is not a complaint about my own posts sometimes dying; I’ve been watching others posts with this hypothesis, and it fits.
I guess there’s a reason for not having it on top-level posts, but I miss having it on top-level posts.
Do you know the reasons? It seems like it would be useful to have it on top-level posts for the same reasons it’s so helpful on comments.
IDK the reasons.
Inline agree/disagree reacts are trying to do the equivalent. Comments are short enough that usually you can summarize your epistemic state with regards to their contents into a single “agree or disagree”, but for posts I feel like it really mostly sets things up for polarization and misunderstandings to have a bunch of people “agree” and “disagree” to a huge bundle of claims and statements.
I think it’s better for people to highlight specific passages of text and then react to those.
Ooh. That makes a lot of sense and is even better… I simply didn’t realize there were inline reacts! Kudos.
I’d like to like this more but I don’t have a clear idea of when to up one, up the other, down one, down the other, or down one and up the other.
The EA Forum has this problem worse, but I’ve started to see it on LessWrong: it feels to me like we have a lot more newbies on the site who don’t really get what LW-style rationality is about, and they make LessWrong a less fun place to write because they are regressing discussion norms back towards the mean.
Earlier this year I gave up on EAF because it regressed so far towards the mean that it became useless to me. LW has still been passable but feels like it’s been ages since I really got into a good, long, deep thread with somebody on here. Partly that’s because I’m busy, but it’s also because I’m been quicker to give up because my expectations of having a productive conversation here are now lower. :-(
Do you have any thoughts on what the most common issues you see are or is it more like that every time it is a different issue?
My impression is that people are quicker to jump to cached thoughts and not actually read and understand things. So I’ve spent more time dealing with what I would consider bad faith takes on posts than I used to where it’s clear to me the person is trying to read in what they want it to have been that I said or meant to imply.
I also have a standing complaint that people are hypocritical by being too lenient towards things they like and too critical of things they don’t like for affiliative reasons rather than because they engaged with the reasoning and arguments.
I see a lot of this from both sides. I know how to farm karma on here, I just mostly choose not to, but when I post things that are of the type that I expect them to be voted up I can be pretty lazy and people will vote it up because I hit the applause light for something they already wanted to applaud. If I post something that I know people will disagree with because it goes against standard takes, I’ve got to be way more detailed. But I see this as a bad asymmetry that results from confirmation bias. I would rather live in a world where lazy posts that say things people already agree with get downvoted for being low quality, or live in a world where posts that people disagree with get upvoted despite disagreeing because they respect the argumentation, but not the world we find ourselves in now.
One thing I’ve been thinking about in this regard is the microhabits around voting.
I only vote on a small minority of the stuff I read. I assume others are similar.
And voting is a bit of a cognitive chore: There are 25 possible ways to vote: strong down/weak down/nothing/weak up/strong up, on the 2 different axes.
I wish I had a principled way of choosing between those 25 different ways to vote, but I don’t. I rarely feel satisfied with the choice I made. I’m definitely inconsistent in my behavior from comment to comment.
For example, if someone makes a point that I might have made myself, is it OK to upvote them overall, or should I just vote to agree? I appreciate them making the point, so I usually give them an upvote for overall—after all, if I made the point myself, I’d automatically give myself an “overall” upvote too. But now that I explicitly consider, maybe my threshold should be higher, e.g. only upvote “overall” if I think they made the point at least as well as I would’ve made it.
In any case, the “point I would’ve made myself” situation is one of a fairly small number of scenarios where I get enough activation energy to actually vote on something.
Sometimes I wonder what LW would be like if a user was only allowed to vote on a random 5% subset of the comments on any given page. (To make it deterministic, you could hand out vote privilege based on the hash of their user ID and the comment ID.) Then nudge users to actually vote on those 5%, or explicitly acknowledge a null vote. I wonder if this would create more of a “jury trial” sort of feel, compared to the current system which can have a “count the size of various tribes” feel.
First of all, I appreciate all the work the LessWrong / Lightcone team does for this website.
The Good
I was skeptical of the agree/disagree voting. After using it, I think it was a very good decision. Well done.
I haven’t used the dialogue feature yet, but I have plans to try it out.
Everything just works. Spam is approximately zero. The garden is gardened so well I can take it for granted.
I love how much you guys experiment. I assume the reason you don’t do more is just engineering capacity.
And yet…
I tend to avoid giving negative feedback unless someone explicitly asks for it. So…here we go.
Over the 1.5 years, I’ve been less excited about LessWrong than any time since I discovered this website. I’m uncertain to what extent this is because I changed or because the community did. Probably a bit of both.
AI Alignment
The most obvious change is the rise of AI Alignment writings on LessWrong. There are two things that bother me about AI Alignment writing.
It’s effectively unfalsifiable. Even betting markets don’t really work when you’re betting on the apocalypse.
It’s highly political. AI Alignment became popular on LessWrong before AI Alignment became a mainstream political issue. I feel like LessWrong has a double-standard, where political writing is held to a high epistemic standard unless it’s about AI.
I have hidden the “AI Alignment” tag from my homepage, but there is still a spillover effect. “Likes unfalsifiable political claims” is the opposite of the kind of community I want to be part of. I think adopting lc’s POC || GTFO burden of proof would make AI Alignment dialogue productive, but I am pessimistic about that happening on a collective scale.
Weird ideas
When I write about weird ideas, I get three kinds of responses.
“Yes and” is great.
“I think you’re wrong because y” is fine.
“We don’t want you to say that” makes me feel unwelcome.
Over the years, I feel like I’ve gotten fewer “yes and” comments and more “we don’t want you to say that” comments. This might be because my writing has changed, but I think what’s really going on is that this happens to every community as it gets older. What was once radical eventually congeals into dogma.
I used to post my weird ideas immediately to LessWrong. Now I don’t, because I feel like the reception on LessWrong would bum me out.[1]
I wonder what fraction of the weirdest writers here feel the same way. I can’t remember the last time I’ve read something on LessWrong and thought to myself, “What a strange, daring, radical idea. It might even be true. I’m scared of what the implications might be.” I miss that.[2]
I get the basic idea
I have learned a lot from reading and writing on LessWrong. Eight months ago, I had an experience where I internalized something very deep about rationality. I felt like I graduated from Level 1 to Level 2.
According to Eliezer Yudkowsky, his target audience for the Sequences was 2nd grade. He missed and ended up hitting college-level. They weren’t supposed to be comprehensive. They were supposed to be Level 1. But after that, nobody wrote a Level 2. (The postrats don’t count.) I’ve been trying―for years―to write Level 2, but I feel like a sequence of blog posts is a suboptimal format in 2023. Yudkowsky started writing the Sequences in 2006, when YouTube was still a startup. That leads me to…
100×
The other reason I’ve been posting less on LessWrong is that I feel like I’m hitting a soft ceiling with what I can accomplish here. I’m nowhere near the my personal skill cap, of course. But there is a much larger potential audience (and therefore impact) if I shifted from writing essays to filming YouTube videos. I can’t think of anything LessWrong is doing wrong here. The editor already allows embedded YouTube links.
Exception: I can usually elicit a positive response by writing fiction instead of nonfiction. But that takes a lot more work.
This might be entirely in my head, due to hedonic adaptation.
This is the part I’m most frustrated with. It used to be you could say some wild stuff on on this site and people would take you seriously. Now there’s a chorus of people who go “eww, gross” if you go too far past what they think should be acceptable. LessWrong culture originally had very high openness to wild ideas. At worst, if you reasoned well and people disagreed, they’d at least ignore you, but now you’re more likely to get downvoted for saying controversial things because they are controversial and it feels bad.
This was always a problem, but feels like it’s gotten worse.
Huh, I am surprised by this. I agree this is a thing in lots of the internet, but do you have any examples? I feel like we really still have a culture of pretty extreme openness and taking random ideas seriously (enough that sometimes I feel like wild sounding bad ideas get upvoted too much because people like being contrarian a bit too much).
Here’s part of a comment on one of my posts. The comment negatively impacted my desire to post deviant ideas on LessWrong.
The comment doesn’t represent a fringe opinion. It has +29 karma and +18 agreement.
I think I’m less open to weird ideas on LW than I used to be, and more likely to go “seems wrong, okay, next”. Probably this is partly a me thing, and I’m not sure it’s bad—as I gain knowledge, wisdom and experience, surely we’d expect me to become better at discerning whether a thing is worth paying attention to? (Which doesn’t mean I am better, but like. Just because I’m dismissing more ideas, doesn’t mean I’m incorrectly dismissing more ideas.)
But my guess is it’s also partly a LW thing. It seems to me that compared to 2013, there are more weird ideas on LW and they’re less worth paying attention to on average.
In this particular case… when you talk about “We don’t want you to say that” comments, it sounds to me like those comments don’t want you to say your ideas. It sounds like Habryka and other commenters interpreted it that way too.
But my read of the the comment you’re talking about here isn’t that it’s opposed to your ideas. Rather, it doesn’t want you to use a particular style of argument, and I agree with it, and I endorse “we don’t want bad arguments on LW”. I downvoted that post of yours because it seemed to be arguing poorly. It’s possible I missed something; I admittedly didn’t do a close read, because while I’ve enjoyed a lot of your posts, I don’t have you flagged in my brain as “if lsusr seems to be making a clear mistake, it’s worth looking closely to see if the error is my own”.
(I am sad that the “avoid paying your taxes” post got downvoted. It does seem to me like an example of the thing you’re talking about here, and I upvoted it myself.)
I also endorse pretty much everything in this comment.
(Except for the bit about the “avoid paying your taxes” post, because I don’t even remember that one.)
To emphasize this point: in many cases, the problem with some “weird ideas” isn’t, like, “oh no, this is too weird, I can’t even, don’t even make me think about this weird stuff :(”. It’s more like: “this is straightforwardly dumb and wrong”. (Indeed, much of the time it’s not even interestingly wrong, so it’s not even worth my time to argue with it. Just: dumb nonsense, already very well known to be dumb nonsense, nothing new to see or say, downvote and move on with life.)
You don’t have to justify your updates to me (and also, I agree that the comment I wrote was too combative, and I’m sorry), but I want to respond to this because the context of this reply implies that I’m against against weird ideas. I vehemently dispute this. My main point was that it’s possible to argue for censorship for genuine reasons (rather than become one is closed-minded). I didn’t advocate for censoring anything, and I don’t think I’m in the habit of downvoting things because they’re weird, at all.
This may sound unbelievable or seem like a warped framing, but I honestly felt like I was going against censorship by writing that comment. Like as a description of my emotional state while writing it, that was absolutely how I felt. Because I viewed (and still view) your post as a character attack on people-who-think-that-sometimes-censorship-is-justified, and one that’s primarily based on an emotional appeal rather than a consequentialist argument. And well, you’re a very high prestige person. Posts like this, if they get no pushback, make it extremely emotionally difficult to argue for a pro-censorship position regardless of the topic. So even though I acknowledge the irony, it genuinely did feel like you were effectively censoring pro-censorship arguments, even if that wasn’t the intent.
I guess you could debate whether or not censoring pro-censorship views is pro or anti censorship. But regardless, I think it’s bad. It’s not impossible for reality to construct a situation in which censorship is necessary. In fact, I think they already exist; if someone posts a trick that genuinely accelerates AI capabilities by 5 years, I want that be censored. (Almost all examples I’d think of would relate to AI or viruses.) The probability that something in this class happens on LW is not high, but it’s high enough that we need to be able to talk about this without people feeling like they’re impure for suggesting it.
I stumbled over this part. What makes someone high prestige? Their total LW karma? To me that doesn’t really make sense as a proxy for prestige.
Hmm, is LessWrong really so intolerant of being reminded of the existence of “deviant ideas”?
Social Dark Matter was pretty well received, with 248 karma, and was posted quite recently.
The much older KOLMOGOROV COMPLICITY AND THE PARABLE OF LIGHTNING opened with a quote from the same Paul Graham essay you linked to (What You Can’t Say).
I was not personally offended by your example post and upvoted it just now. I probably at least wouldn’t have downvoted it had I seen it earlier, but I hadn’t.
People love deviant ideas in abstract, hate to deal with specific deviant ideas that attack beliefs they hold dear.
lsusr’s example post seemed to not be a specific deviant idea though. To paraphrase one point: beware of banning apparent falsity lest you inadvertently ban true heresies, without naming any heresy in particular.
Many readers appeared to dislike my example post. IIRC, prior to mentioning it here, it’s karma (excluding my auto hard upvote) was close to zero, despite it having about 40 votes.
Hi there, lsusr!
I read the post & comment which you linked, and indeed felt that the critical comment was too combative. (As a counterexample, I like this criticism of EY for how civil it is.) That being said, I think I understand the sentiment behind its tone: the commenter saw your post make a bunch of strong claims, felt that these claims were wrong and/or insufficiently supported by sources, and wrote the critical comment in a moment of annoyance.
To give a concrete example, “We do not censor other people more conventional-minded than ourselves.” is an interesting but highly controversial claim. Both because hardly anything in the world has a 100% correlation, and because it leads to unintuitive logical implications like “two people cannot simultaneously want to censor one another”.
Anyway, given that the post began with a controversial claim, I expected the rest of the post to support this initial claim with lots of sources and arguments. Instead, you took the claim further and built on it. That’s a valid way to write, but it puts the essay in an awkward spot with readers that disagree with the initial claim. For this reason, I’m also a bit confused about the purpose of the essay: was it meant to be a libertarian manifesto, or an attempt to convince readers, or what? EDIT: Also, the majority of LW readers are not libertarians. What reaction did you expect to receive from them?
If I were to make a suggestion, the essay might have worked better if it had been a dialogue between a pro-liberty and a pro-censorship character. Why? Firstly, if readers feel like an argument is insufficiently supported, they can criticize or yell at the character, rather than at you. And secondly, such a dialogue would’ve required making a stronger case in favor of censorship, and it would’ve given the censorship character the opportunity to push back against claims by the liberty character. This would’ve forestalled having readers make similar counterarguments. (Also see Scott’s Nonfiction Writing Advice, section “Anticipate and defuse counterarguments”.)
My best example of this comes from this post of mine on EAF (my LW examples are a bit more ambiguous). Multiple folks quickly jumped to making a Nazi argument, almost in parody of Godwin’s Law.
I don’t have an opinion on your post itself, but it is indeed disappointing that the comments immediately jumped to the Nazi comparison, which of course made all further discussion pointless.
I thought Genesmith’s latest post fully qualified as that!
I totally didn’t think adult gene editing was possible, and had dismissed it. It seems like a huge deal if true, and it’s the kind of thing I don’t expect would have been highlighted anywhere else.
The post about not paying one’s taxes was pretty out there and had plenty interesting discussion, but now it’s been voted down to the negatives. I wish it was a bit higher (at 0-ish karma, say), which might’ve happened if people could disagree-vote on it.
But yes, overall this criticism seems true, and important.
I’ve strong-upvoted it to −1, because I agree.
Another improvement I didn’t notice until right now is the “respond to a part of the original post” feature. I feel like it nudges comments away from nitpicking.
I didn’t quite parse that – which UI element are you referring to?
I meant side-comments. I never use them myself, but people often use them to comment on my posts. When they do, the comments tend to be constructive, especially compared to blockquotes.
Ah cool. That was my best guess but wasn’t sure.
One thing that could help is to be able to have automatic crossposting from your YouTube channel like you can currently have from a blog. It would be even more powerful if it generated a transcript automatically (though that’s currently difficult and expansive).
A few points on this:
Some Youtube videos already come with good captions.
For the rest, Youtube provides automatic captions. These are really bad, lack punctuation and capitalization, but even at that level of quality they could e.g. be used to pinpoint where something was said.
Transcription via OpenAI Whisper is cheap ($0.36 per hour) and quite decent if there’s only one speaker. For interviews and podcasts, the experience is not good enough for transcription (to create this podcast transcript at the beginning of the year, I used Whisper as a base, but still had to put in many many hours of editing), because it e.g. doesn’t do speaker diarisation or insert paragraph breaks. But I’m pretty sure that by now there are hybrid services out there which can do even the things Whisper is bad at. This still won’t yield a professional-level transcript, though doing an editing pass with GPT4 might close the gap. My point is, these transcripts are not expensive, relative to labor costs.
The implementation of automatic AI transcripts has become surprisingly simple. E.g. as I mentioned here, I now get automatic transcripts for my voice notes, based on following a step-by-step video guide. The difficulty is not yet at consumer-level simple (though for those purposes, one can just pay for an AI transcription service app), but it’s definitely already at the level of hobbyist-simple.
There are also writers with a very large reach. A recommendation I saw was to post where most of the people and hence most of the potential readers are, i.e. on the biggest social media sites. If you’re trying to have impact as a writer, the reachable audience on LW is much smaller. (Though of course there are other ways of having a bigger impact than just reaching more readers.)
Do you remember any examples from back in the day?
I enjoy your content here and would like to continue reading you as you grow into your next platforms.
YouTube grows your audience in the immediate term, among people who have the tech and time to consume videos. However, text is the lowest common denominator for human communication across longer time scales. Text handles copying and archiving in ways that I don’t think we can promise for video on a scale of hundreds of years, let alone thousands. Text handles search with an ease that we can only approximate for video by transcribing it. Transcription is tractable with AI, but still requires investment of additional resources, and yields a text of lower quality and intentionality than an essay crafted directly by its own author.
Plenty of people spend time in situations where they can read text but not listen to audio, and plenty of people spend time in situations where they can listen to audio but not read text. Compare the experience of listening to an essay via text to speech to the experience of reading a youtube video’s auto-generated transcript. Which makes you feel like it’s improving how you think?
I’m learning how to film, light and edit video. I’m learning how to speak better too, and getting a better understanding about how the media ecosystem works.
Making videos is harder than writing, which means I learn more from it.
Ah, that makes perfect sense. On the other side, watching videos is often easier than reading, so I often feel like I learn more from the latter =)
I just posted a big effortpost and it may have been consigned to total obscurity because I posted it at the wrong time of day. Unsure whether I actually want the recommendation algorithm to have flattened time-discounting over periods with less activity on the site, or if I should just post more strategically in the future.
I have found the dialogues to be generally low-quality to read. The good ones tend to be more interview-like—“I have something I want to talk about but writing a post is harder than talking to a curious interlocutor about it.” I think this maybe suggests that I want to see dialogues rebranded to not say “dialogue.”
(Note, I don’t think it’s because it was posted at the wrong time of day. I think it’s because the opening doesn’t make a clear case for why people should read it.
In my experience posts like this still get a decent amount of attention if they are good, but it takes a lot longer, since it spreads more by word-of-mouth. The initial attention burst of LW is pretty heavily determined by how much the opening paragraphs and title draw people in. I feel kind of sad about that, but also don’t have a great alternative to the current HN-style algorithm that still does the other things we need karma/frontpage-sorting algorithm to do)
It’s hard to envision a different solution to this problem. When I browse a feed and decide what to read, of course things like author, karma, title, and first paragraph are the things that determine whether I’ll consider reading. How else could things work?
@Charlie Steiner: Also see this comment thread on why it’s so important to pay outsized importance to stuff like the title and presentation. Excerpts from my comment:
And:
Yeah, fair enough.
I think time of day combined with when it was approved for front page can easily make all the difference between takeoff and just fading into obscurity.
This is an unfortunate situation, but I don’t have a solution.
I do wonder why posts with AI tags aren’t on front page automatically without human review.
I mean, many posts with AI tags don’t meet frontpage norms. For example AI news isn’t timeless, and as such doesn’t make it onto the frontpage.
Ah, that makes sense. I never see AI stuff on the front page or in recent discussions that isn’t worth at least a glance, but that’s a good thing. I do not want to see every little AI news piece on the front page.
What about leaning into the word-of-mouth sharing instead, and support that with features? For example, being able to as effortlessly as possible recommend posts to people you know from within LW?
Not crazy. I also think doing things that are a bit more social where you have ways to recommend (or disrecommend) a post with less anonymity attached, allowing us to propagate that information further, is not crazy, though I am worried about that incentivizing more groupthinking and weird social dynamics.
I’m not sure what the current algorithm is other than a general sense of “posts get promoted more if they’re more recent,” but it seems like it could be a good idea to just round it all up so that everything posted between 0 and N hours ago is treated as equally recent, so that time of day effects aren’t as strong.
Not sure about the exact value of N… 6? 12? It probably depends on what the current function is, and what the current cycle of viewership by time of day looks like. Does LW keep stats on that?
I think overall I’ve found dialogues pretty good, I’ve found them useful for understanding people’s specific positions and getting people’s takes on areas I don’t know that well.
My favorite one so far is AI Timelines, which I found useful for understanding the various pictures of how AI development will go in the near term. I liked How useful is mechanistic interpretability? and Speaking to Congressional staffers about AI risk for understanding people’s takes on these areas.
AI content for specialists
There is a lot of AI content recently, and it is sometimes of the kind that requires specialized technical knowledge, which I (an ordinary software developer) do not have. Similarly, articles on decision theories are often written in a way that assumes a lot of background knowledge that I don’t have. As a result there are many articles I don’t even click at, and if I accidentally do, I just sigh and close them.
This is not necessarily a bad thing. As something develops, inferential distances increase. So maybe, as a community we are developing a new science, and I simply cannot keep up with it. -- Or maybe it is all crackpottery; I wouldn’t know. (Would you? Are some of us upvoting content they are not sure about, just because they assume that it must be important? This could go horribly wrong.) Which is a bit of a problem for me, because now I can no longer recommend Less Wrong in good faith as a source of rational thinking. Not because I see obviously wrong things, but because there are many things where I have no idea whether they are right or wrong.
We had some AI content and decision theory here since the beginning. But those articles written back then by Eliezer were quite easy to understand, at least for me. For example, “How An Algorithm Feels From Inside” doesn’t require anything beyond high-school knowledge. Compare it to “Hypothesis: gradient descent prefers general circuits”. Probably something important, but I simply do not understand it.
Just like historically MIRI and CFAR split into two organizations, maybe Less Wrong should too.
Feeling of losing momentum
I miss the feeling that something important is happening right now (and I can be a part of it). Perhaps it was just an illusion, but at the first years of Less Wrong it felt like we were doing something important—building the rationalist community, inventing the art of everyday rationality, with the perspective to raise the general sanity waterline.
It seems to me that we gave up on the sanity waterline first. The AI is near, we need to focus on the people who will make a difference (whom we could recruit for an AI research), there is no time to care about the general population.
Although recently, this baton was taken over by the Rational Animations team!
Is the rationalist community still growing? Offline, I guess it depends on the country. In Bratislava, where I live, it seems that ~ no one cares about rationality. Or effective altruism. Or Astral Codex Ten. Having five people at a meetup is a big success. Nearby Vienna is doing better, but it is merely climbing back to pre-COVID levels, not growing. Perhaps it is better at some other parts of the world.
Online, new people are still coming. Good.
Also, big thanks to all people who keep this website running.
But still it no longer feels to me anymore like I am here to change the world. It is just another form of procrastination, albeit a very pleasant one. (Maybe because I do not understand the latest AI and decision theory articles; maybe all the exciting things are there.)
Etc.
Some dialogs were interesting, but most are meh.
My greatest personal pet peeve was solved: people no longer talk uncritically about Buddhism and meditation. (Instead of talking more critically they just stopped talking about it at all. Works for me, although I hoped for some rational conclusion.)
It is difficult for me to disentangle what happens in the rationalist community from what happens in my personal life. Since I have kids, I have less free time. If I had more free time, I would probably be recruiting for the local rationality (+adjacent) community, spend more time with other rationalists, maybe even write some articles… so it is possible that my overall impression would be quite different.
(Probably forgot something; I may add some points later.)
I think that starting things that are hard forks of the lesswrong memeplex might be beneficial to being able to grow. Raising the sanity waterline would need to go memetic to work—and so would need to engage directly with raising the memetic sanity waterline. To do that, it probably would need to deeply forgo branding. For example, in a recent discussion about effective altruism, I made the point that I care nil about the “effective altruism” brand, and if that brand is dead, so be it. I care about effectively causing a better world. Anything that morphs into a brand rather than description is lost purpose, and so I think things that are designed to change names if their name becomes a brand might be more effective. I’ve been considering writing a post about this but I think my writing style tends to be a bit … messy … to get upvoted here.
Please do. I’ve been mulling over related half-digested thoughts—replacing the symbol / brand with the substance, etc.
I love LessWrong. I have better discussions here than anywhere else on the web.
I think I may have a slightly different experience with the site than the modal user because I am not very engaged in the alignment discourse.
I’ve found the discussions on the posts I’ve written to be of unusually high quality, especially the things I’ve written about fertility and polygenic embryo screening.
I concur with other comments about the ability to upvote and agree/disagree with a comment to be a great feature which I use all the time.
My number one requested feature continues to be the ability to see a retention graph on the posts I’ve written, i.e. where do people get bored and stop reading? After technical accuracy my number one goal is to write something interesting and engaging, but I lack any kind of direct feedback mechanism to optimize my writing in that way.
Yeah, I’ve been wanting something like this for a while. It would require capturing more data and processing a bunch of data than we have historically. Also distinguishing between someone skimming up and down a post and actually reading it seems like a kind of finicky algorithm problem that would require a bunch of iteration to get right, which I think makes it a relatively big project.
perhaps showing the user what data they’re creating by incrementally marking the post as read as they scroll down it, and display that to the user?
(low confidence, low context, just an intuition)
I feel as though the LessWrong team should experiment with even more new features, treating the project of maintaining a platform for collective truth-seeking like a tech startup. The design space for such a platform is huge (especially as LLMs get better).
From my understanding, the strategy that startups use to navigate huge design spaces is “iterate on features quickly and observe objective measures of feedback”, which I suspect LessWrong should lean into more. Although, I imagine creating better truth-seeking infrastructure doesn’t have as good of a feedback signal as “acquire more paying users” or “get another round of VC funding”.
This is basically what we do, capped by our team capacity. For most of the last ~2 years, we had ~4 people working full-time on LessWrong plus shared stuff we get from EA Forum team. Since the last few months, we reallocated people from elsewhere in the org and are at ~6 people, though several are newer to working on code. So pretty small startup. Dialogues has been the big focus of late (plus behind the scenes performance optimizations and code infrastructure).
All that to say, we could do more with more money and people. If you know skilled developers willing to live in the Berkeley area, please let us know!
Does GPT-4 or Copilot help with the coding? Have you tried?
I’m a software developer, but it would have to be remote, and might depend on your stack.
I use Cursor, Copilot, sometimes GPT-4 in the chat, and also Hex.tech’s built-in SQL shoggoth.
I would say the combination of all those helps a huge amount, and I think has been key in allowing me to go from pre-junior to junior dev in the last few months. (That is, from not being able to make any site changes without painstaking handholding, to leading and building a lot of the Dialogue matching feature and associated stuff (I also had a lot of help from teammates, but less in a “they need to carry things over the finish line for me”, and more “I’m able to build features of this complexity, and they help out as collaborators”)).
But also, PR review and advise from senior devs on the team has also been key, and much appreciated.
It does, quite a bit! Definitely speeds me up somewhere between 20% and 100% depending on task. And I think it’s a bigger deal for those now working on code and who are newer to it.
Agreed! Cf. Proposal for improving the global online discourse through personalised comment ordering on all websites—using LessWrong as the incubator for the first version of the proposed model would actually be critical.
I feel a mix of pleased and frustrated. The main draw for me is AI safety discussion. I dislike the feeling of group-think around stuff, and I value the people who speak up against the group-think with contrary views (e.g. TurnTrout), who post high quality technical content, or well-researched and thought-out posts (e.g. Steven Byrnes).
I feel frustrated at things like feeling that people don’t always do a good job of voting comments up based on how valuable/coherent/high-effort the information content is, and then separately voting agree/disagree. I really like this feature, and I wish people gave it more respect. I am pleased that it does as well as it does though.
I like the new emojis and the new dialogues. I’m excited for the site designers to keep trying new (optional) stuff.
The things I’d like more from the site would be if it could split into two: one which was even more in the direction of technical discussion of AI safety, and the other for rationality and philosophy stuff. And then I’d like the technical side to have features like jupyter notebook-based posts for dynamic code demonstrations. And people presenting recent important papers not their own (e.g. from arxiv), for the sake of highlighting/summarizing/sparking-discussion. The weakness of the technical discussion here is, in my opinion, related to the lack of engagement with the wider academic community and empirical evidence.
Ultimately, I don’t think it matters much what we do with the site in the longer term because I think things are about to go hockey stick singularity crazy. That’s the bet I’m making anyway.
Yeah. The threshold for “okay, you can submit to alignmentforum” is way, way, way too high, and as a result, lesswrong.com is the actual alignmentforum. Attempts to insist otherwise without appropriately intense structural change will be met with lesswrong.com going right on being the alignmentforum.
Ok, slightly off topic, but I just had a wacky notion for how to break-up groupthink as a social phenomenon. You know the cool thing from Audrey Tang’s ideas, Polis? What if we did that, but we found ‘thought groups’ of LessWrong users based on the agreement voting. And then posts/comments which were popular across thought-groups instead of just intensely within a thought group got more weight?
Niclas Kupper tried a LessWrong Polis to gather our opinions a while back. https://www.lesswrong.com/posts/fXxa35TgNpqruikwg/lesswrong-poll-on-agi
So, something like the community notes algorithm?
https://vitalik.eth.limo/general/2023/08/16/communitynotes.html
Ah, as a non-Twitter user I hadn’t known about this. Neat.
Quote
This is the formalization of the concept “left hand whuffy” from Charlie Stross’s “down and out in the magic kingdom”, 2003. When people who usually disagree with people like you actually agree with you or like what you’ve said, that’s special and deserves attention. I’ve always wanted to see it implemented. I don’t usually tweet but I’ll have to look at this.
Down and Out in the Magic Kingdom was by Cory Doctorow, not Stross.
Good catch. I’d genuinely misremembered. I lump the two together, but generally far prefer Stross as a storyteller, even though Doctorow’s futurism is also first-rate, in a different dimension. I found the story in Down and Out to be Stross-quality.
That sort of good idea for a social network improvement is definitely signature Doctorow, though.
Another idea is to upweight posts if they’re made by a person in thought group A, but upvoted by people in thought group B.
Yeah, I’m interested in features in this space!
Another idea is to implement a similar algorithm to Twitter’s community votes: identify comments that have gotten upvotes by people who usually disagree with each other, and highlight those.
This idea is definitely simmering in many people’s heads at the moment :)
How private are the LessWrong votes?
Would you want to do it overall or blog by blog. Seems pretty doable.
Currently, the information about who voted which way on what things is private to the individual who made the vote in question and the LW admins.
So if doing this on LW votes, it’d need to be done in cooperation with the LW team.
I’m pasting this here because it’s the sort of thing I’d like to see. I’d like to see where I fall in it, and at least the anonymized position of others. Also, it’d be cool to track how I move over time. Movement over time should be expected unless we fall into the ‘wrong sort of updateless decision theory’ as jokingly described by TurnTrout (and term coined by Wei Dai). https://www.lesswrong.com/posts/j2W3zs7KTZXt2Wzah/how-do-you-feel-about-lesswrong-these-days-open-feedback?commentId=X7iBYqQzvEgsppcTb
I still like the site, though I had to set the AI tag to −100 this year. One thing I wish was a bit different is that I’ve posted a whole bunch of LW-site-relevant feedback in comments (my natural inclination is to post comprehensive feedback on whatever content I interact with), and for a good fraction of them I’ve received no official reaction whatsoever. I don’t know if the comments got ignored because the LW team didn’t see them, or didn’t have the time to act on them, or whatever, but I still wish I’d gotten some kinds of reactions on them.
I’m not asking for my feedback to be implemented[1], but when I post feedback comments on site posts by LW team members, I do wish I got some kind of acknowledgement, even if it always turned out to be “we’ve seen this feedback, but we have bigger fish to fry”.
As an example, here are all my unanswered feedback requests since 2023-08-01 (arbitrary cutoff from when I got bored browsing my comments history):
Most important: My comment thread which requested that the Forum Magnum Github list of open issues be cleaned up, and which also included a smaller & actionable ask to sync the issue statuses from the private Lightcone Asana back to the public Github.
Incidentally, lack of reactions to some of my feedback is one reason why I wish that issue tracker was useful, rather than being in the sorry state it’s in. My LW-related feedback comments are like posting Github issues on an open-source project: i.e. I try to make the site better, and expect a reply like “we’ll implement this” or “we won’t implement this” or “we haven’t seen this, but someone will look at this eventually” or something. Plus I want to be able to check back on the issues months later, and neither comments nor Intercom are the right format for that. Whereas on Github, one can easily find all one’s feedback in one place; the status of each issue is transparently clear; and I get update notifications.
The New User’s Guide to LessWrong asked for feedback, I posted three huge feedback comments (1, 2, 3), and I’m not sure anyone ever saw them, let alone reacted to them.
Or there’s this minor feedback comment on your post on the Dialogue Matching feature.
Or this comment on the Carving of Reality book where I pointed out that the books actually can be purchased in a few countries apart from the US, and so the FAQ warranted an update. Raemon even responded to the thread, but the FAQ is still incomplete as written.
And this unofficial thread on the LW Wiki contains feedback by me and others, which I wish had received some kind of attention.
This question on an Open Thread: “What’s the minimum one would have to learn to productively contribute to open Github issues in the LW / EA Forum codebase?”
These questions on the process of making the “The Carving of Reality” book set.
This request for a retrospective for Good Heart Week 2022.
Finally, if someone on the LW team was interested, it could be neat to dialogue on a topic like “LW and open-source contributions”.
Though it occasionally does get implemented: like fixes to this bug report on comment reactions, or to a report on the comments counter being bugged in discussions.
Just as an FYI: pinging us on Intercom is a much more reliable way of ensuring we see feature suggestions or bug reports than posting comments. Most feature suggestions won’t be implemented[1]; bug reports are prioritized according to urgency/impact and don’t always rise to the level of “will be addressed” (though I think >50% do).
At least not as a result of a single person suggesting them; we have ever made decisions that were influenced on the margin by suggestions from one or more LW users.
Genuine question: Why Intercom? What’s so good about it?
It advertises urgency (“we’ll be back in X hours”), which seems unnecessary for almost all bug reports and feature requests. When I post a non-time-sensitive bug report, I just want to know that someone will look at it eventually; I don’t need a reply within 24h from LW team members whose time is valuable.
My list of previous Intercom messages (20 threads total) is sorted in chronological order of last reply, cannot be rearranged, has no subject headings and is unsearchable. So I have to click into each thread to know what it’s about.
I can’t delete or archive old Intercom threads, so this list becomes increasingly unwieldy over time.
Old Intercom threads have poor timestamps, only being accurate to a week.
Or consider this bug report, which I forwarded on Intercom, but which only got resolved because another user contributed their experience in the comments. That couldn’t have happened if the bug report was restricted to a 1-to-1 chat within Intercom.
Re: reliability & follow-ups:
I asked to have the Filan podcast transcript not set to Personal Blog status, and didn’t receive a reply.
Or, a year ago, I reported a (non-time-sensitive) bug that I was unable to use the Reset Password button. Someone took a look, I mentioned it wasn’t time-sensitive, and the exchange ended on me saying “Just let me know when it becomes possible again to reset my password.” and them saying “Cool, will do”. Naturally I never received a further update on this; the button just got fixed at some point. I’m mentioning this not to blame the LW team member, but to indicate that the Intercom medium is just not the right tool for anything that requires long-term follow-up.
Similarly, I just bumped another Intercom question which I’d originally asked in August.
Re: issue trackers:
The second-ever Intercom message I received (322w / 6 years ago) began with: “Hey! If you’ve found a bug on the site, please feel free to file it as an issue on our Github.” So at least back then, Github did get used. What changed?
I’ve made a few bug reports where the Intercom reply (IIRC by you) was “Will put it in the queue”. Which is appreciated, but which also implies that there is a queue, which kind of sounds like an issue tracker (?), except that it’s not public.
Intercom has the benefit of acting as an inbox on our side, unlike comments posted on LW (which may not be seen by any LW team member).
In an ideal world, would Github Issues be better for tracking bug reports? Probably, yes. But Github Issues require that the user reporting an issue navigate to a different page and have a Github account, which approximately makes it a non-starter as the top-of-funnel.
Intercom’s message re: response times has some limited configurability but it’s difficult to make it say exactly the right thing here. Triaging bug reports from Intercom messages is a standard part of our daily workflow,so you shouldn’t model yourself as imposing unusual costs on the team by reporting bugs through Intercom.
re: reliability—yep, we are not totally reliable here. There are probably relatively easy process improvements here that we will end up not implementing because figuring out & implementing such process improvements takes time, which means it’s competing with everything else we might decide to spend time on. Nevertheless I’m sorry about the variety of dropped balls; it’s possible we will try to improve something here.
re: issue tracker—right now our process is approximately “toss bugs into a dedicated slack channel, shared with the EA forum”. The EA forum has a more developed issue-tracking process, so some of those do find their way to Github Issues (eventually).
Thanks for the reply. I think we’ve reached the limits of what can be discussed in a comment thread. Would you be interested in doing a dialogue on this topic? I’m thinking of a somewhat broader phrasing, something like: “Would better support for open-source contributions free up or cost LW team resources?” or “LW and open-source contributions: costs & benefits”, or similar.
(And, re: “I’m sorry about the variety of dropped balls”, I want to be clear that I appreciate everything you and the team do, and I understand that you’re a small team with a big mission. The reason why I gave examples of when the Intercom process was less than 100% reliable was not meant as blame, but just to support my argument that the tool seems ill-suited for certain kinds of reliability, like follow-ups.)
Crossposting my comment from here:
kave’s reply:
my reply:
Overall feels like it’s ok, but very frustrating because it feels like it could be so much better. But I don’t think this is mainly about the software of LW; it’s about culture more broadly in decay (or more precisely, all the methods of coordinating on visions having been corrupted and new ones not gaining steam while defending boundaries).
A different thing: This is a problem for everyone, but: stuff gets lost. https://www.lesswrong.com/posts/DtW3sLuS6DaYqJyis/what-are-some-works-that-might-be-useful-but-are-difficult It’s bad, and there’s a worldwide problem of indexing the Global Archives.
I like LessWrong a lot.
I discovered the site nearly 2 years ago, and have sort of meandered through old and new posts enjoying them. Something I have observed is that, having now gone through most of the best of the “back-catalogue” (old stuff) I am now visiting and reading less, because stuff I like is added at a given rate and I was consuming much faster than that rate. This is relevant, because without being careful it creates the impression of reduced good content. So perhaps any feedback along the lines of “there used to be more good stuff” should be checked for this illusion.
I have filtered AI tagged stuff off completely. I come to LessWrong to read about some weird new idea someone has (eg. a crazy thought experiment, a seemingly-mad ethical claim or a weird fiction). Five posts on AI alignment was enough for me. I don’t need to see more. I am really pleased that the filter system allows this to work so seamlessly—I just don’t see AI stuff any more and sometimes kind of forget LessWrong is used as an “AI place” by some people.
I like the react symbols, and agree/disagree voting. I have not tried reading or participating in a dialogue, and am unlikely to do so. The dialogue format doesn’t seem like the right frame for the kind of thing I like.
It works quite well; the one limitation is that the tag filter can only filter out posts that have been tagged correctly, which brand-new posts aren’t necessarily. That said, I just checked the New Post editor, and there’s now a section to apply tags from within the editor. So this UX change likely reduced the proportion of untagged posts.
We also have a system which automatically applies “core tags” (AI, Rationality, World Modeling, World Optimization, Community, and Practical) to new posts. It’s accurate enough, particulary with the AI tag, that it enables the use-case of “filter out all AI posts from the homepage”, which a non-zero number of users want, even if we still need to sometimes fix the tags applied to posts.
Ah, that’s great.
Bullet points of things that come to mind:
I am little sad about the lack of Good Ol’ Rationality content on the site. Out of the 14 posts on my frontpage, 0 are about this. [I have Rationality and World Modeling tags at +10.]
It has been refreshing to read recent posts by Screwtape (several), and I very much enjoyed Social Dark Matter by Duncan Sabien. Reading these I got the feeling “oh, this is why I liked LessWrong so much in the first place”.
(Duncan Sabien has announced that he likely won’t post on LessWrong anymore. I haven’t followed the drama here too much—there seems to be a lot—but reading this comment makes me feel bad for him. I feel like LessWrong is losing a lot here: Sabien is clearly a top rationality writer.)
I second the basic concern “newcomers often don’t meet the discourse norms we expect on the site” others have expressed. I also second the stronger statement “a large fraction of people on the site are not serious about advancing the Art”.
What to do about this? Shrug. Seems like a tough problem. Just flagging that I think this indeed is a problem.
(I have reservations on complaining about this, as I don’t think my top-level posts really meet the bar here either.)
Ratio of positive/negative feedback leans more negative than I think is optimal.
[To be clear, this is a near-universal issue, both on the Internet and real life. Negativity bias, “Why our kind can’t cooperate”, “I liked it, but I don’t have anything else to say, so why would I comment?”, etc.]
[I was going to cite an example here, but then I noticed that the comments in fact did contain a decent amount of positive feedback. So, uh, update based on that.]
Doing introspection, this is among the main reasons I don’t contribute more.
(Not that my posts should have bunch of people giving lots of positive feedback. It’s that if I see other people’s excellent posts getting lukewarm feedback, then I think “oh, if even this isn’t good enough for LessWrong, then...”)
Upvotes don’t have quite the same effect as someone—a real person! - saying “I really liked this post, thanks for writing it!”
(Note how my feedback in this comment has so far been negative, as a self-referential example of this phenomenon)
On the positive side, I agree with many others on the agree-disagree voting being great. I also like the rich in-line reacts.
(I don’t really use the rich in-line reacts myself. I think if it were anonymous, I would do it much more. There’s some mental barrier regarding reacting with my own name. Trying to pin it down, it’s something like “if I start reacting, I am no longer a Lurker, but an Active Contributor and I am Accountable for my reactions, and wow that feels like a big decision, I won’t commit to it now”.)
(I don’t really endorse these feelings. Writing them down, they seem silly.)
(I think it’s overall much better that reacts are non-anonymous.)
I second concerns about (not-so-high-quality) alignment content flooding LessWrong.
(Again, I’m guilty of this as well, having written a couple of what-I-now-think-are-bad posts on AI.)
This is despite me following alignment a lot and being in the “intended audience”.
As you might infer from my first bullet points, I would like there to be more of Good Ol’ Rationality—or at least some place somewhere that was focused on that.
(The thought “split LW into two sites, one about AI and another about rationality” pops to my mind, and has been mentioned Nathan Helm-Burger below. Clearly, there are huge costs to this. I’d consider this a pointer towards the-type-of-things-that-are-desirable, not as a ready solution.)
The most novel idea in this comment: I think the ratio of comments/post is too small.
I think ideally there would be a lot more comments, discussion and back-and-forth between people than there is currently.
The dialogues help solve the problem, which is good.
(I am overall more positive about dialogues than a couple of negative comments I’ve seen about them.)
Still, I think there’s room for improvement. I think new posts easily flood the channels (c.f. point above) without contributing much, whereas comment threads have much smaller costs. Also, the back-and-forth between different people is often where I get the most value from.
The exception is high-quality top-level posts, which are, well, high-quality.
So: a fruitful direction (IMO) is “raise the bar for top-level posts, have much more discussion under such top-level posts and the bar for commenting lower”.
(What does “raising/lowering the bar” actually mean? How do we achieve it? I don’t know :/.)
And inspired by my thoughts on the positivity of feedback, let me say this: I still consider LessWrong a great website as websites go. Even if I don’t nowadays find it as worldview-changing as when I first read it, there’s still a bunch of great stuff here.
I think that Duncan writing on his own blog and we linking the good posts from LW may be the best solution for both sides. (Duncan approves of being linked.)
I think there are a lot of old posts that don’t get read. I’m most drawn to the Latest Posts because that’s where the social interaction via commenting is. LessWrong is quite tolerant of comments on old posts, but they don’t get as much engagement. It’s too diffuse to be self-sustaining, but I feel like the newcomers are missing out on that in the core material.
What can we do about that? Maybe someone else has a better idea, but I think I’d like to see an official community readthrough of the core sequences (at least RAZ, Codex, and old Best Of) pinned in the Latest Posts area so it actually gets engagement from newcomers. Maybe they should be copies so the comments start out empty, but with some added language encouraging newcomers to engage.
There has got to be a better, more durable way to show recent comments than the current recent comments view.
I feel pretty good about LessWrong. The amount of attention I give to LW tends to ebb and flow in phases, and I’m currently in a phase where I gave it less attention (In large part due to the war in Israel), and now I’m probably going to enter into a phase of giving it a lot of attention because of the 2022 review.
I think the team is doing a great job with the site, both in terms of feature and moderation, and the site keeps getting better.
I do feel the warping effect of the AI topic on the site, and I’m ambivalent about it. On the one hand, I do think it’s an important topic that should be discussed here, on the other hand, it does flood out everything else (I’ve changed my filters to deal with it) and a lot of it is low quality. I also see and feel the pressure to make everything related to AI somehow, which again, I’m ambivalent about. On the one hand if it’s so significant and important, then it makes sense to connect many things to it, on the other hand, I’m not sure it does much good to the writing on the site.
I also wish the project to develop the art of rationality got more attention, as I think it is still important and there’s a lot of progress to be made and work to be done. But I also wish that whatever attention it got would be of higher quality—there are very few rationality posts in the last few years that were on the level of the old essays from Eliezer and Scott.
Perhaps the problem is that good writers just don’t stay on LessWrong, and prefer to go on their own platforms or to twitter where they can get more attention and make money from their writing. One idea I have to deal with that is to implement a gifting feature (with real money), perhaps using plural funding. I think it can incentivize people to write better things, and incentivize good writers to also post on LW. I definitely know it would motivate me, at least.
Another thing I would like, which would help deal with the fact that lots of writing that’s relevant to LW isn’t on LW, is to improve the way linkposts work. Currently, I come across a lot of writing that I want to share on LW, but it would be drowned out if I shared it in the open thread or a shortform post, and I don’t want to share it as a linkpost because I don’t want it to be displayed on my page as one of my posts (and drown out the rest of my posts). I also don’t feel like I deserve all the Karma it would get, so it feels a bit… dirty? Here’s what I have in mind instead—have a clear distinction between corssposts and linkposts:
Crossposts would be your own writing that you post partially or in full on LW, and it would be shown like Linkposts are currently shown.
Linkposts would be someone else’s writing that you want to share and discuss on LW, and it would be shown in a different section on your profile (“Shared posts” or something), perhaps also in a different section on the front page (though I’m not sure about that), would have the original author’s name (you’d have to input that) which you could click and it would bring to something like a user page for that author which would show all their writing which was shared onto LW, and the user who shared the post would get 10% of the karma it gets (their name would be displayed, but not as the author). Other than that they would work normally—they would show normally on the all posts page and in search, and you could tag them like you would any other post.
I think these two features would greatly help LW be a place where great writing can be found and discussed, and hopefully that disproportionately includes writing on the art of rationality.
The linkposts idea is interesting. I agree that it’s weird to get karma for posting linkposts for other people.
In addition, there’s also a problem where, no matter on which site you are (e.g. Reddit or Twitter or LW), native posts get much more engagement and upvotes than linkposts that require visiting an external site. But of course you also can’t just copy the content from the external site, because that would be copyright infringement.
Anyway, as per elsewhere in this thread, your linkposts suggestion has a higher chance of being seen if you also make it on Intercom.
Fwiw I just think it’s fine to get karma for sharing linkposts – you’re doing a valuable service if you’re sharing useful information. I don’t know of other forums that draw a distinction between linkposts and regular posts in terms of where they show up.
It makes sense that it feels a bit weird, but given our limited dev time I think I’d mostly recommend feeling more free to do linkposts as they currently are (marking down who the author is in the title, so people can see what’s going on)
My main aversion is that I don’t want them to drown out my own posts on my user page.
I like the agree-disagree vote and the design.
With the content and votes...
- my impression is until ~1-2 years ago LW had a decent share of great content; I disliked the average voting “taste vector”, which IMO represented somewhat confused taste in roughly “dumbed down MIRI views” direction. I liked many of the discourse norms
- not sure what exactly happened, but my impression is LW is often just another battlefield in ‘magical egregore war zone’. (It’s still way better than other online public spaces)
What I mean by that is a lot of people seemingly moved from ‘let’s figure out how things are’ into ’texts you write are elaborate battle moves in egregore warfare″. Don’t feel excited about pointing to examples, but impression are …growing share of senior top-ranking users who seem hard to convince about anything, can not be bothered to actually engage with arguments, writing either literal manifestos or in manifesto-style.
I like LW, and think that it does a certain subset of things better than anywhere else on the internet.
In particular, terms of “sane takes on what’s going on” I can usually find them somewhere in the highly upvoted posts or comments.
I think in general my issue with LW is it just reflects the pitfalls of the rationalist worldview. In general the prevailing view conflates intelligence with wisdom, and therefore fails to grasp what is sacred on a moment to moment level that allows skillful action.
I think the fallout of SBF, the fact that rationalists and EAs keep building AI capabilities organizations, rationality adjacent cults centered around obviously immoral world views etc., are all predictable consequences of doing a thing where you try to intelligence hard enough that wisdom comes out.
I don’t really expect this to change, and expect LW to continue to be a place that has the sanest takes on what’s going on and then leads to incredible mistakes when trying to address that situation. And that previous sentence basically sums up how I feel about LW these days.
The thing that seems to me to have gotten worse is what gets upvoted. AI content is the big one here; it’s far too easy to get a high karma post about an AI related topic even if the post really isn’t very good, which I think has a ton of bad downstream consequences. Unfortunately, I think this extends even to the Alignment Forum.
I have no idea what to do about it though. Disagreement voting is good. Weighted voting is probably good (although you’d have to see who voted for what to really know). And the thing where mods don’t let every post through is also good. I wish people would vote differently, but I don’t have a solution.
Here are the Latest Posts I see on my front page and how I feel about them (if I read them, what I remember, liked or disliked, if I didn’t read them, my expectations and prejudices)
Shallow review of live agendas in alignment & safety: I think this is a pretty good overview, I’ve heard that people in the field find these useful. I haven’t gotten much out of it yet, but I will probably refer to it or point others to it in the future. (I made a few very small contributions to the post)
Social Dark Matter: I read this a week or so ago. I think I remember the following idea: “By behaving in ways that seem innocuous to me but make some people not feel safe around me, I may be filtering information, and therefore underestimating the prevalence of a lot of phenomena in society”. This seems true and important, but I haven’t actually spent time thinking about how to apply it to my life, e.g. thinking about what information I may be filtering.
The LessWrong 2022 Review: I haven’t read this post. Thinking about it now does makes me want to review some posts if I find the time :-)
Deep Forgetting & Unlearning for Safely-Scoped LLMs: I skimmed this, and I agree that this is a promising direction for research, both because of the direct applications and because I want a better scientific understanding of the “deep” in the title. I’ve talked about unlearning something like once every 10 days for the past month and a half, so I expect to talk about it in the future. When I do I’ll likely link to this.
Speaking to Congressional staffers about AI risk: I read this dialogue earlier today and enjoyed it. Things I think I remember (not checking): staffers are more open-minded than you might expect + would love to speak to technical people, people overestimate how much “inside game” is happening, it would be better if DC AI-X-risk related people just blurted out what they think but also it’s complicated, Akash thought Master of the Senate was useful to understand Congress (even though it took place decades ago!).
How do you feel about LessWrong these days?: I’m here! Good to ask for feedback.
We’re all in this together: Haven’t read and don’t expect to read. I don’t feel excited about Orthogonal’s work and don’t
shareEDIT: agree with my understanding of their beliefs. This being said I haven’t put work into understanding their worldview, I couldn’t pass Tamsin’s ITT, seems there would be a lot of distance to bridge. So I’m mostly going off vibes and priors here, which is a bit sad.On ‘Responsible Scaling Policies’ (RSPs): Haven’t read yet but will probably do so, as I want to have read almost everything there is to read about RSPs. While I’ve generally enjoyed Zvi’s AI posts, I’m not sure they have been useful to me.
EA Infrastructure Fund’s Plan to Focus on Principles-First EA: I read this quickly, like an hour ago, and felt vaguely good about it, as we say around here.
Studying The Alien Mind: Haven’t read and probably will not read. I expect the post to contain a couple of interesting bits of insight, but to be long and not clearly written. Here too I’m mostly going off vibes and priors.
A Socratic dialogue with my student: Haven’t read and probably won’t read. I think I wasn’t a fan of some past lsurs posts, so I don’t feel excited about reading a Socratic dialogue between them and their student.
**In defence of Helen Toner, Adam D’Angelo, and Tasha McCauley**: I read this earlier today, and thought it made some interesting points. I don’t know enough about the situation to know if I buy the claims (eg is it now clear that sama was planning a coup of his own? do I agree with his analysis of sama’s character?)
Neural uncertainty estimation review article (for alignment): Haven’t read it, just now skimmed to see what the post is about. I’m familiar with most of the content already so don’t expect to read it. Seems like a good review I might point others to, along with eg CAIS’s course.
[Valence series] 1. Introduction: Haven’t read it, but it seems interesting. I would like to better understand Steve Byrnes’ views since I’ve generally found his comments thoughtful.
I think a pattern is that there is a lot of content on LessWrong that:
I enjoy reading,
Is relevant to things that I care about,
Doesn’t legibly provide more than temporary value: I forget it quickly, I can’t remember it affecting my decisions, don’t recall helping a friend by pointing to it.
The devil may be in “legibly” here, eg maybe I’m getting a lot out of reading LW in diffuse ways that I can’t pin down concretely, but I doubt it. I think I should spend less time consuming LessWrong, and maybe more time commenting, posting, or dialoguing here.
I think dialogues are a great feature, because:
I generally want people who disagree to talk to each other more, in places that are not Twitter. I expect some dialogues to durably change my mind on important topics.
I think I could learn things from participating in dialogues, and the bar to doing so feels lower to me than the bar to writing a post.
ETA: I’ve been surprised recently by how many dialogues have specifically been about questions I had thought and been a bit confused about, such as originality vs correctness, or grokking complex systems.
ETA: I like the new emojis.
I used to visit every day since 2018 and find one or two interesting articles to read on all kinds of topics.
For the past few months I just read zvi’s stuff and any AI related not too technical articles.
Some Reddit forums have dedicated days to topics. I don’t know if having AI stuff only a few days a week would help restore the balance haha.
Note you can use the tag-filters to filter out AI or otherwise adjust the topics in your Latest feed.
Yeah, that reminds me of this thread https://www.lesswrong.com/posts/P32AuYu9MqM2ejKKY/so-geez-there-s-a-lot-of-ai-content-these-days
One concrete complaint I have is that I feel a strong incentive toward timeliness, at the cost of timelessness. Commenting on a fresh, new post tends to get engagement. Commenting on something from more than two weeks ago will often get none, which makes effortful comments feel wasted.
I definitely feel like there is A Conversation, or A Discourse, and I’m either participating in it during the same week as everyone else, or I’m just talking to myself.
(Aside: I have a live hypothesis that this is tightly related to The Twitterization of Everything.)
I think this a real problem (tho I think it’s more fundamental than your hypothesis would suggest; we could check commenting behaviour in the 2000s as a comparison).
We have some explorations underway addressing related issues (like maybe the frontpage should be more recommender-y and show you good old posts, while the All Posts page is used for people who care a lot about recency). I don’t think we’ve concretely considered stuff that would show you good old posts with new comments, but that might well be worth exploring.
I love LessWrong. I love it for my professional work on alignment (in tandem with AF), and I love it for learning about rationality and the world.
There are problems with LessWrong, but I challenge anyone to name an alternative that’s better.
I think the high standards of civility (dare I say niceness) and rigor are undervalued. Having less pollution from emotionally-focused and deeply flawed arguments is hugely important.
This question has started me thinking about a post titled “the case for LessWrong as a tool for rapid scientific progress”.
The comparison to working in academia for years is night and day different, in favor of LessWrong.
If you’re interested in prior discussion of that, see list of posts tagged “Intellectual Progress via LessWrong”.
There was never a point in the past ~nine years of me knowing about it when I viewed as lesswrong as anything but the place the odd, annoying, mostly wrong ai safety people went, having participated with the community around it for about that long, mostly without reading much here. Eventually I decided I wanted to talk to them more. I generally think of lesswrong as a place to talk at people who won’t listen but who need to hear an other perspective—and I say that agreeing that the world is in dire straits and needs saving from superhuman agency (which I think is currently concentrated in human organizations that people here consistently underestimate). I see it as a scientific forum of middling quality that is related to the research topic I care about. I occasionally find something I like on it, try to share it, and get blowback from one group of friends that I’m browsing that site again. The upvote mechanism seems to vigorously promote groupthink, especially with the harsh effects of downvotes on newbies. I do think of it as one of the few spaces online that is consistently a conversation ground between progressive classical liberals and conservative classical liberals, so that’s nice, I guess.
Got any better forums to point me to? I’ll take a look and decide for myself how they compare to LessWrong.
nope! middling quality is the best I know of. the things that populate my mental quality space in order to give lesswrong the feeling of being middling quality are things that, despite being lower quality on their own, make me think LW is missing things in order to hit what I would consider “high quality on all dimensions”. I’ve been pretty happy with certain kinds of discussions on eg bluesky, and I think there are elements of discourse there that are good that are missing here, but the nuggets of good there are smaller and not part of an overall project of epistemics, but rather just people having interesting takes. There are also some mastodons I’ve occasionally seen good takes from, but I don’t go deep on mastodon. I have a lot of frustrations with LW that make me feel that people here are missing important insights, and I am working on how to distill and present them. but nothing that is as distilled good as lesswrong; I just still find that distilled good to be painfully lacking on certain traits.
to be clear, as I’ve become an ai safety person over the past 5 years, it didn’t change my view that most ai safety people are odd, annoying, and mostly wrong.
More later, thanks for poking me to get my ass in gear about communicating the things I’m being vague about in this comment. My ass is pretty far out of gear, so don’t expect this too soon.
OK, I’ll check out bluesky, thanks.
it’s basically just twitter minus elon vibes, so it’s not exactly highly novel, fwiw. I have invite codes if you decide you do want them.
What’s up with all this Dialogues stuff? It’s confusing…
I don’t really like reading the dialogues, and I mostly skip them. Most of the ones I have read have felt like I’m just watching people sort out their ideas rather than reading some ideas that have already been sorted out.
Yes, perhaps there could be a way having dialogues edited for readability.
The sidebar that shows all comments by author is incredibly useful (to me)!
I don’t know how long ago it was put in, but when I noticed it, it made it waaaaay easier for me to parse through big conversation trees, get a sense for what people are thinking, and zero in on threads I want to read in detail.
Thanks to whoever had that idea and implemented it!
I used to comment a fair bit over the last decade or so, and post occasionally. After the exodus of LW 1.0 the site was downhill, but the current team managed to revive it somehow and they deserve a lot of credit for that, most sites on the downward trajectory never recover.
It felt pretty decent for another few years, but eventually the rationality discourse got swamped by the marginal quality AI takes of all sorts. The MIRI work, prominently featured here, never amounted to anything, according to the experts in ML, probability and other areas relevant to their research. CFAR also proved a flop, apparently. A number of recent scandals in various tightly or loosely affiliated orgs did not help matters. But mainly it’s the dearth of insightful and lasting content that is sad. There is an occasional quality post, of course, but not like it used to be. The quality discourse happens on ACX and ACXD and elsewhere, but rarely here. To add insult to injury, the RSS feed stopped working, so I can no longer see the new posts on my offsite timeline.
My guess is that the bustling front disguises serious issues, and maybe the leadership could do what Eliezer called “Halt, melt, and catch fire”. Clearly this place does not contribute to AI safety research in any way. The AI safety agitprop has been undoubtedly successful beyond wildest dreams, but seems like it’s run its course, now that it has moved into a wider discourse. EA has its own place. What is left? I wish I knew. I would love to see LW 3.0 taking off.
Check out GreaterWrong’s RSS feeds; you can click the “RSS” link at the top right of any page to get a feed for that view (frontpage, all, curated, whatever else).
Have you reported this on Intercom as a bug report?
No, I assume I would not be the only person having this issue, and if I were the only one, it would not be worth the team’s time to fix it. Also, well, it’s not as important anymore, mostly a stream of dubious AI takes.
But if everyone assumes this and thinks thus…
Thankfully, human traits are rather dispersive.
Can you say more about the RSS feed not working? I just checked the basics and they still seem to work.
Could be the app I use. It’s protopage.com (which is the best clone of the defunct iGoogle I could find):
Hmm, I don’t know how that works. If you go to LessWrong.com/feed.xml you can see the RSS feed working.
I noticed the error said http rather than https.
neither worked… Something with the app, I assume.
Yeah, I just came to this post from my RSS reader (Feedbin) and it says I’m subscribed to https://www.lesswrong.com/feed.xml
Thank you for checking! None of the permutations seem to work with LW, but all my other feeds seem fine. Probably some weird incompatibility with protopage.
I like it a lot. I’m mainly a tumblr usr, and on tumblr we’re all worried about the site being shut down because it doesn’t make any money. I love having LessWrong as a place for writing up my thoughts more carefully than I would on tumblr, and it also feels like a sort of insurance policy if tumblr goes under, since LessWrong seems to be able to maintain performance and usability with a small team. The mods seem active enough that they frontpage my posts pretty quickly, which helps connect them with an audience that’s not already familiar with me, whereas on tumblr I haven’t gotten any readers through the tag system in years and I’m coasting on inertia from the followers I already have.
I feel incredibly fond for LessWrong. I’ve learned so much awesome stuff. And while not perfect, there’s a community of people who more or less agree on and are familiar with various, er, “epistemic things”, for lack of a better phrase. Like, it’s nice to at least know that the person you’re conversing with knows about and agrees on things like what counts as evidence and the map-territory distinction.
That said, I do share the impression that others here have expressed of it heading “downhill”. Watered down. Lower standards. Less serious. Stuff like that. I find it a little annoying and disappointing, but nothing too crazy.
Personally I have posts with the AI, Existential Risk, and Death tags marked as “Hidden” (those topics make me unhappy). So my feed probably looks a lot different from yours. I’ve noticed a reduction in quality and quantity of content.
Things I’d really like to see:
Long-running conversations
A solution for babbling
Some sort of matchmaking where person 1 wants to learn about X and person 2 wants to teach X
I don’t know, looking back at older posts (especially on LW 1.0) current LW is less “schizo” and more rigorous/boring—though maybe that’s because I sometimes see insanely long & detailed mechinterp posts?
I’ve proposed for LessWrong to bootstrap [BetterDiscourse] for bringing Twitter’s Community Notes (pol.is, viewpoints.xyz) value to any popular content on the web, as well as likely improving the sense-making on LessWrong itself, as TurnTrout’s and other people’s comments confirm my own thinking that LessWrong hasn’t avoided groupthink culture.
We did some experiments with Community Notes esques things recently, although I’m not sure how it worked out. @kave ?
I ran some experiments using only the core of the Community Notes algorithm, taking votes to be helpfulness ratings. I didn’t get anything super interesting out of it, though I might have had implementation bugs. The top posts according to the model seemed fine, and then I didn’t allocate much time to poking around at it any more.
How prepared is LW for an attack? Those who want AI research to proceed unimpeded have an incentive to sabotage those who want to slow it down or ban it and consequently have an incentive to DDoS LW.com or otherwise make the site hard to use. What kind of response could LW make against that?
Also, how likely is it that an adversary will manage to exploit security vulnerabilities to harvest all the PMs (private messages) stored on LW?
For this kind of thing it’s both a positive and a negative that the LW codebase is open-source and available on Github.
I think there’s a lot of good content here, but there are definitely issues with it tilting so much towards AI Safety. I’m an AI Safety person myself, but I’m beginning to wonder if it is crowding out the other topics of conversation.
We need to make a decision: are we fine with Less Wrong basically becoming about AI because that’s the most important topic or do we need to split the discussion somehow?
AI safety posts generally go over my head, although the last one I read seemed fantastically important and accessible.
AI-safety posts are probably the most valuable posts here, even if they crowd out other posts (both posts I think are valuable and posts I think are, at best, chaff).
Too much AI content, not enough content about how to be wrong less.
I don’t like that I can’t take for granted that any particular poster has read the Sequences. Honestly that’s a pretty major crux for how to engage with people. The Sequences are a pretty major influence on my worldview, which should matter a lot for those who want to change my mind about things.
Sometimes I think of johnswentworth’s comment about the Fosbury Flop, and I feel some yearning and disappointment.
I like Exercises and Challenges. The Babble challenge was cool. So was that time Luke Muelhauser did math in public(!). It would at least be kind of fun and engaging to see more things like this. They also allow people to demonstrate their process and get critiques.
I would like to see more explicit practice of debiasing techniques. I want LessWrong to be more than just a smarter version of Reddit or Twitter. I want to see different types of interactions, that are verifiably truth-loaded. Things sort of in this direction include: Zvi sticking his neck out with covid predictions, public bets, EY’s LK99 crux-mapping, adversarial collaboration, ITT, the anti-kibitzer, and hypothetical apostasy.
Note that GreaterWrong has a native anti-kibitzer feature.
So for the most part I’m really happy with it—I think it’s got a great UI and a great feel. I haven’t much used the Dialogues feature (not even reading them), but they don’t interfere in any way with the rest of my experience.
One thing I think might need some tuning is the feature that limits the post rate based on the karma of your previous posts. I’ve once found myself rate-limited due to it, and the cause was simply that my last 20 comments had been in not particularly lively discussions where they ended up staying at the default 2⁄0 score. Now I suppose you could construe that as “evidently you haven’t said anything that was contributing particularly to the discussion”, but is that enough to justify rate limiting? If someone was outright spamming, surely they’d be down voted, so that’s not the reason to do it. I’d say a pattern of consistent down voting is a better base for this. After that I found myself trying to “pick up” my score for a while by going to comment posts that were already highly popular to make sure my comment would be seen and upvoted enough to avoid this happening again, and that seems like something you don’t particularly want to incentivize. It just reinforces posts on the basis of them being popular, not necessarily what one honestly considers most interesting.
The first time I tried to load this page it took >10 seconds before erroring out in a way that made me need to close the website and open it again.
Also, it seems like the site is spamming me to make dialogues. To get to this post in “recent comments” I had to scroll down past some people the site was suggesting I dialogue with. I clicked the “x” next to each of those entries to get them to go away. Then, when I had to re-load the page, two more suggested dialogue partners had spawned. This was after I turned off the notifications I had been subscribed to pinging me whenever anyone wanted to dialogue with me.
There’s a bunch of interesting AI alignment content, more than I feel like I have the bandwidth or inclination to read. I also like that there’s a trickle of new interesting users, e.g. it’s cool that Maxwell Tabarrok is on my front page.
As is often said, I’d be interested in more “classic rationality” content relative to AI stuff. Like, I don’t think we’re by any means perfect on that axis, or past some point of diminishing return. Since it’s apparently easier to write posts about AI, maybe someone should write up this paper and turn it into life advice. Alternatively, I think looking at how ancient people thought about logic could be cool (see e.g. the white horse paradox or Ibn Sina’s development of a modal logic system or whatever).
I have the impression that lots of people find LW too conflict-y, but I think we avoid the worst excesses of being a total forum of everyone agreeing with each other about how great we all are, and that more disagreement would make that better, as long as it’s with gentleness and respect, as they say.
Oh also the pattern of what things of mine get upvoted vs downvoted feel pretty weird. E.g. I thought my post on a mistake I think people make in discussions about open-source AI was a good contribution, if perhaps poorly written. But it got fewer upvotes than a post that was literal SEO. I guess the latter introduced people to a cool thing that wasn’t culture-war-y and explained it a bit, but I think the explanation I gave was pretty bad, because as mentioned, I actually just wanted it to be SEO.
Oh, also I think the site wants me to care about the review or believe that it’s important/valuable, but I don’t really.
You should click the settings gear in the “Dialogues” section to hide suggested partners from you.
A little while ago I vented at my shortform on this topic, https://www.lesswrong.com/posts/pjCnAXMkXjbmLw3ii/nim-s-shortform?commentId=EczMSzhPMpRAEBhhj.
Since writing that, I still feel a widening gap between my views and those of the LW zeitgeist. I’m not convinced that AI is inevitably killing everybody if it gets smart enough, as the smartest people around here seem to believe.
Back when AI was a “someday” thing, I feel like people discussed its risks here with understanding of perspectives like mine, and I felt like my views were gradually converging toward those of the site as I read more. It felt like people who disagreed about x-risk were regarded as potential allies worth listening to, in a way that I don’t experience from more recent content.
Since AI has become a “right now” thing, I feel like there’s an attitude that if you aren’t already sold on AI destroying everything then you’re not worth discussing it with. This may be objectively correct: if someone with the power to help stop AI from destroying us and finite effort to exert spends their time considering ignorant/uninformed/unenlightened perspectives such as my own, diverting that effort from doing more important things may be directly detrimental to the survival of the species.
In short, I get how people smarter than I am are assigning high probability to us being in a timeline where LW needs to stop being the broader forum that I joined it for. I figure they’re probably doing the right thing, and I’m probably in the wrong place for what LW is needing to become. Complaining about losing what LW was to make way for what it needs to be feels like complaining about factories transitioning from making luxury items to making essential supplies during a crisis.
And it feels like if this was whole experience was a fable, the moral would be about alignment and human cooperation in some way ;)
It’s fun to come through and look for interesting threads to pull on. I skim past most stuff but there’s plenty of good and relevant writing to keep me coming back. Yeah sure it doesn’t do a super great job of living up to the grandiose ideals expressed in the Sequences but I don’t really mind, I don’t feel invested in ~the community~ that way so I’ll gladly take this site for what it is. This is a good discussion forum and I’m glad it’s here.
oh wait, major trivial irritation: it keeps forgetting I set the theme to dark mode! something about brave’s improved privacy settings, perhaps? if dark mode could be stored serverside that would be grand
Yeah, I think we store theme settings in a cookie. You might just want to manually permit the cookie in the Brave settings (we intentionally don’t do it server-side because many users want to have different settings for different devices, so doing it at the cookie level seems like the right call).
I have already allowed the brave cookie and it times out anyway :[
I am relatively new to the community, and was excited to join and learn more about the actual methods to address AI risks, and how to think scientifically generally.
However after using for a while, I am a bit disappointed. I realized I probably had to filter many things here.
Good:
There are good discussions and previous summaries that are actually useful on alignment. There are people who work on these things from both ngos and industry showing what research they are doing, or what actions they have taken on safety. Similarly with bioweapons etc.
I also like articles like trying to identify how to find what’s the best thing to do with intersections of passion, skill, and importance.
I like the articles that mention/promotes contacting reality.
Bad:
Sometimes I feel the atmosphere is “edgy”, and sometimes I see people may argue over relatively small things that I don’t know how the conclusion will lead to actual actions. And maybe this is just the culture here, but I found it surprising how easy people call each other “wrong” although many times I felt like both sides are just opinions. And I felt like I see less “I think” or “in my opinion” to quality claims than usual workplace at least. People appear to be very confident or sure about their own belief when communicating. From my understanding, I think people may practicing the “strong opinion weakly hold” thinking they could say something strong and change easily—I found that to be easier in verbal communication among colleagues, schoolmates or friends where one can talk to them (a relatively small group) every day. But on a platform where there are a lot more people, and tracking on opinions changes is hard, it might be more productive to consider modifying the “strong” opinion part and quality more in the first place.
I do think the downvote or upvote, which is related to how much you can comment or contribute to the site (or if you can block a person or not), encourage group think and you would need to be identify with the sentiment of the majority (I think another answer mentioned group think as well).
I am feeling many articles or comments are quite personal/non-professional (communication feels different from what I encounter at work), which makes this community a bit confusing mixing personal and professional opinions/sharing. I think it would be nice to have a professional section, and a separate personal sections, and also encourage different communication rules for more communication efficiency, and I guess could naturally filter some articles for people at certain times wanting to focus on different things. Could be good to organize articles better by section as well? Though there is “tags” currently.
This is a personal belief, but I am a bit biased for action and hope to see more discussions on how to execute things, or at least how should actions change based on a proposed belief change.
This might be something more fundamental that is based on personal belief vs (some but not everyone on) lesswrong belief—to a certain extend I appreciate prioritization, but when it is too extreme I feel it is 1) counterproductive on solving issue itself, 2) too extreme that discourages new comers that also want to work on shared issues. It also feels more fear driven rather than rationality driven, which is discrediting in my opinion.
For 1, Many areas to work on sometimes are interrelated, and focusing only on one may not actually achieve the goal.
For 2, Sometimes it just feels alarming/scary when I see people trying to say “do not let other issues we need to solve get in the way of {AI risks/some particular thing}”.
I am sensing (though I am still kinda new here, so I might not have dig enough through articles) is that we may lack some social science connections/backgrounds and how the world actually works, even when talking about society related things (I forgot what specific articles gave me this feeling, maybe related to AI governance.)
I think for now, I probably will continue using but with many many filters.
Seems fine—good.
I have enjoyed some dialogues though I think there is a lot of content.
For me, I’d like a focus on summarisation and consensus. What are the things we all agree on on a topic.
Even on this thread, I think there could be a way to atomise ideas and see what the consensus is.
I find LessWrong really useful for learning things, but it’s also become kind of overwhelming, especially because a lot of people don’t start posts with a summary so I can’t quickly filter. My RSS feed has about 500 unread LessWrong posts and I doubt I’ll read more than 1/5th of them after summarizing.
I’m thinking of writing my own software to run posts through an LLM to get a one-paragraph summary. I’m tempted to try to filter by feed by tag, but that’s too high level (I can’t follow in-the-weeds AI safety research, but I do want to read certain kinds of high-level technical posts).
LLM summaries aren’t yet non-hallucinatory enough that we’ve felt comfortable putting them on the site, but we have run some internal experiments on this.
My opinion is a bit mixed on LessWrong at the moment. I’m usually looking for one of two types of content whenever I peruse the site:
- Familiar Ideas Under Other Names: Descriptions of concepts and techniques I already understand that use language more approachable to “normal” people than the highly-niche jargon I use myself, which help me discuss them with others more conveniently
- Unfamiliar or Forgotten Ideas: Descriptions of concepts and techniques I haven’t thought of recently or at all, which can be used as components for future projects
I’ve only been using the site for a few months, but I found a large initial surge of Familiar Ideas Under Other Names, and now I have my filters mostly set up to fish for possible new catches over time. Given the complexity and scope of some of my favorite posts in this category, I’m still fairly satisfied with a post meeting these requirements only showing up once a month or so. Before coming to LW, I would seldom encounter such things, so I’m still enjoying an increased intake.
I’ve been having a much harder time finding Unfamiliar or Forgotten Ideas, but that category has always been a tricky one to pursue even at the best of times, so it’s hard to speculate one way or another about whether the current state of the site is acceptable or not.
On a more general note, I’m not able to direct much interest towards much of the AI discussions because it rates very poorly on the “how important is this to me” scale. I’ve been having to invest some effort into adjusting my filters to compensate, but I notice that there’s still a lot of content that is adjacent-but-not-directly AI that sneaks in anyways. However, I haven’t had too long to fully exercise the filters, so I don’t want to present that as some sort of major issue when it’s currently just a bit tiresome.
As for my thoughts on LW generally, I both like and dislike the site pretty severely.
On the one hand, I do think it has some major positives compared to basically every other site. In particular, I explicitly like the fact that politics is very discouraged here, which allows for much more productive conversations, and more generally I think the moderation system is quite great, and I especially like the fact that they try to keep the garden well-kept.
I also like the fact that they try to separate the concepts of disagreement and it’s a bad post/comment via the agree/disagree system, separating the role of karma and agree/disagree voting.
I also agree with a lot of lsusr’s “The Good” claims on LW.
If there’s one reason I stay on LW, it’s probably the quality of the conversation doesn’t get nearly as bad as the rest of the internet, and is quite great, and while LW is overvalued, useful insights can be extracted if you’re careful.
I mostly agree with TurnTrout and lsusr’s answers on what the problems are, with a sidenote of the fact that I suspect a lot of problems came from the influx of FiO readers into LW without any other grounding point. Niplav thankfully avoided this wave, and I buy that the empirical circles like Ryan Greenblatt’s social circle isn’t relying on fiction, but I’m worried that a lot of non-experts are so bad epistemically speaking that they ended up essentially writing fanfic on AI doom, and forget to check whether the assumptions actually hold in reality.
(Idea drawn from TurnTrout and JDP in the discord, they both realized the implications of a FiO influx way before I did.)
This seems unlikely to me, since HPMOR peaked in popularity nearly a decade ago.
Where did you get that from, exactly? I’d be a little surprised if this was right.
I mean, it was published between 2010 and 2015, and it’s extremely rare for fanfiction (or other kinds of online serial fiction) to be more popular after completion than while it’s in progress. I followed it while it was in progress, and am in fact one of those people who found LessWrong through it. There was definitely an observable “wave” of popularity in both my in-person and online circles (which were not, at the time, connected to the rationality community at all); I think it probably peaked in 2012 or 2013.
Thanks for chiming in, I remember hearing that there was an influx of HPMOR readers fairly recently.
What does FiO stand for?
FiO?
Friendship is Optimal, a My Little Pony fan fiction about an AGI takeover (?) scenario. 39k words. (I don’t know the details, haven’t read it.)
Yep, I was talking about that fanfiction.
I love the react system, so much so that I usually use it as my favored mode of communication, because I get to point out stuff that is important without having to go through the process of writing a comment, which can be long.
I especially like the way reacts can be applied to individual pieces of text, and IMO the react system, especially the semantic reacts have been a good place where LW outperforms other websites.
Sadly, LW isn’t a community that I would say that I am a part of. I say that begrudgingly, as LW seems to ‘have been’ and still is, ‘a decent place on the internet’.
The issue with being decent, is that it doesn’t work long term, at least not for me.
Why did other people leave LW before? I’m not sure. Why do I want to leave? And what drew me here in the first place?
I came here to seek for people with integrity, people thinking outside the box, highly intelligent and willing to both pursue their individuality and take/give feedback from equals/peers—with the intention of getting help, but also provide support in growing my own as well as the rationality/general intelligence/EQ/bigger goals of others, in a congruous, open-ended, honest, sincere and cooperative environment.
To take ideas, concepts and take them to their logical conclusion, is something I care about, and was hoping to find a community that is Congruous and Coherent according to its own explicit ideas and values, with enough discernment to make it work. This is a tall order perhaps, but I was hoping, when I found this place, that it was closer to that ideal.
From what I’ve seen, there might be a slightly higher population of the kinds of people I’m looking for here, but on the other hand, there is a wide gulf between what those people want and need to thrive, and the kind of environment LW is providing.
I’m not the most articulate in writing, but I wrote about this gulf of Who is LW for in some comments, and also a post called “The LW crossroads of purpose”.
And, I see it as a very pressing matter, not only because laissez-faire seems to ruin subcultures, but because there are so many places on the internet where your average Joe can go, but so few where it seems those that crave high-end personal, rational, emotional development, can actually get support, and support each other.
A place where integrity, respect and cooperation is a fundamental practice, and where things aren’t solved through “democracy”, but by finding the best way to go forward. A place that supports the creation of the very good/best, and not the decently/average+ good.
I’m not aware if those that ‘left’ LW went somewhere more coherent in this regard. Substack seems to be a place, but is there a ‘community’ out there waiting? At least not that I am aware of? Which means I would rather write this, and on the total off chance that this idea gets traction, and LW will have a “serious dojo” for rationality—with a high bar to entry, in a high trust environment that grows organically and slowly; I’ll at least hear of it, and might even want to join.
I wouldn’t even mind if it had a subscription fee of sorts, and some of the members got paid. Why sweat the small stuff.
For now, I’ll stay in the shadows, and maybe look at older posts and see who was here before. Maybe some of them is someone I want to talk to.
Kindly,
Caerulea-Lawrence
Lesswrong is great but the social groups that run it are all focussed on AI xrisk. Same goes for the high karma users who get more upvote power and get to decide what reaches the homepage.. My timelines and xrisk estimates are lower than them (15% ASI by 2030, 5% humanity dead by 2030) and hence I’d like to be able to discuss other topics.
I currently vaguely feel I will have single-handedly lead such an effort if I wanted it to happen. Maybe you’ll see more of me (but not anon) in the coming years.
LW, along with Astral Codex Ten, are the best places on the internet. Lately LW tops the charts for me, perhaps because I’ve made it through Scott’s canon but not LW’s. As a result, my experience on LW is more about the content than the meta and community. Just coming here, I don’t stumble across much evidence of conflict within this community—I only learned about it after friending various rationalists on FB such as Duncan (which btw I really like having rationalists in my FB feed, which does give me a sense of community and belongingness… perhaps there is something to having multiple forums).
On the slight negative side, I have long believed LW to be an AI doom echo chamber. This is partly due to my fibrotic intuitions, persisting despite reading Superintelligence and having friends in AI safety research, and only breaking free after ChatGPT. But part of it I still believe is true. The reasons include hero worship (as mentioned already on this thread), the community’s epistemic landscape (as in, it is harder and riskier to defend a position of low vs high p(doom)), and perhaps even some hegemony of language.
In terms of the app: it is nice. From my own experiences building apps with social components, I would have never guessed that a separate “karma vote” and “agreement vote” would work. Only on LW!
LessWrong is mostly ok. Specific problems/new things I’d like:
NEW REACTION EMOJIS
A reaction emoji to say “be quantitative here”. Example: someone says “too much”, I can’t infer from context how much is too much, I believe it’s the case I need to know that to carry their reasoning through, and I want them to stick their neck out and say a number. Possible icons: stock chart, magi-stream of numbers, ruler, atom, a number with digits and a question mark (like “1.23?”), dial.
A reaction emoji to say “give a calibration/reference level for this”. Example: interlocutor says “A is intelligent”, I can’t infer from context who A is intelligent compared to, and I want them to say “compared to B”, “relative to average in group ABCD”, or similar. Possible icons: double-plate scale with an object on one side and a question mark on the other (too complex maybe?), ruler, caliper, °C/°F, bell curve, gauge stick in water or snow.
TECHNICAL PROBLEMS
In the last month loading any LessWrong page hangs indefinitely for me. I have to hit reload about 5 times in close sequence to unjam it.
“HIDE USER NAMES” PROBLEMS
The “hide user names” option is not honored everywhere. The names unconditionally appear in the comments thread structure on the left, and in dialogue-related features (can’t remember the exact places).
The “hide user names” feature would be more usable if names were consistently replaced with auto-generated human-friendly nicknames (e.g., aba-cpu-veo, DarkMacacha, whatev simple scheme of random words/syllables), re-generated and assigned on page load. With the current “(hidden)” placeholder it’s quite difficult to follow discussions. After this modification, the anonymized user names should have different formatting or decoration to avoid being confused with true usernames.
When the actual user name appears by hovering over the fake one, it annoyingly flickers between the two states if the actual name is shorter than the placeholder. I guess the simplest solution is bounding the width to remain at least that of the placeholder.
I often involuntarily reveal a name by running across it with the pointer, or can’t resist the temptation. It should be somewhat expensive to uncover a single name. Maybe a timer like the one for strong votes.
I often have an ugh feeling towards reading long comments.
Posts are usually well written, but long comments are usually rambly, even the highest karma ones. It takes a lot of effort to read the comments on top of reading the post, and the payoff is often small.
But for multiple reasons, I still feel an obligation to read at least some comments, and ugh.
For possible solutions:
1. This is my problem and I should find a way to stop feeling ugh
2. Have some ways to easily read a summary of long comments (AI or author generated)
3. People should write shorter comments on average
Pretty good overall. My favorite posts are about the theory of the human mind that helps me build a model of my own mind and the minds of others, especially in how it can go wrong (mental illness, ADHD, et. al.)
The AI stuff is way over my head, to the point where my brain just bounces off of the titles alone, but that’s fine—not everything is for everyone. Also reading the acronyms EDT and CDT always make me think of the timezones, not the decision theories.
About the only complaint I have is that the comments can get pretty dense and recursively meta, which can be a bit hard to follow. Zvi will occasionally talk about a survey of AI safety experts giving predictions about stuff and it just feels like a person talking about people talking about predictions about risks associated with AI. But this is more of a me thing and probably people who can keep up find these things very useful.