Consider the reaction my comment from three months ago got.
Vaniver
I think being a Catholic with no connection to living leaders makes more sense than being an EA who doesn’t have a leader they trust and respect, because Catholicism has a longer tradition
As an additional comment, few organizations have splintered more publicly than Catholicism; it seems sort of surreal to me to not check whether or not you ended up on the right side of the splintering. [This is probably more about theological questions than it is about leadership, but as you say, the leadership is relevant!]
I don’t think Duncan knows what “a boundary” is.
General Semantics has a neat technology, where they can split out different words that normally land on top of each other. If boundary_duncan is different from boundary_segfault, we can just make each of the words more specific, and not have to worry about whether or not they’re the same.
I’ve read thru your explainer of boundary_segfault, and I don’t see how Duncan’s behavior is mismatched. It’s a limit that he set for himself that defines how he interacts with himself, others, and his environment. My guess is that the disagreement here is that under boundary_segfault, describing you as having “poor boundaries” is saying that your limits are poorly set. (Duncan may very well believe this! Tho the claim that you set them for yourself makes judging the limits more questionable. )
That said, “poor boundaries” is sometimes used to describe a poor understanding or respect of other people’s boundaries. It seems to me like you are not correctly predicting how Duncan (or other people in your life!) will react to your messages and behavior, in a way that aligns with you not accurately predicting their boundaries (or predicting them accurately, and then deciding to violate them anyway).
This isn’t something that I do. This is something that I have done
I don’t understand this combination of sentences. Isn’t he describing the same observations you’re describing?
There is a point here that he’s describing it as a tendency you have, instead of an action that happened. But it sure seems like you agree that it’s an action that happened, and I think he’s licensed to believe that it might happen again. As inferences go, this doesn’t seem like an outlandish one to make.
The friends who know me well know that I am a safe person. Those who have spent even a day around me know this, too!
The comments here seem to suggest otherwise.
You talk about consent as being important to you; let’s leave aside questions of sexual consent and focus just on the questions: did Duncan consent to these interactions? Did Duncan ask you to leave him alone? Did you leave him alone?
I wasn’t sure what search term to use to find a good source on this but Claude gave me this:
I… wish people wouldn’t do this? Or, like, maybe you should ask Claude for the search terms to use, but going to a grounded source seems pretty important to staying grounded.
I think Six Dimensions of Operational Adequacy was in this direction; I wish we had been more willing to, like, issue scorecards earlier (like publishing that document in 2017 instead of 2022). The most recent scorecard-ish thing was commentary on the AI Safety Summit responses.
I also have the sense that the time to talk about unpausing is while creating the pause; this is why I generally am in favor of things like RSPs and RDPs. (I think others think that this is a bit premature / too easy to capture, and we are more likely to get a real pause by targeting a halt.)
While the coauthors broadly agree about points listed in the post, I wanted to stick my neck out a bit more and assign some numbers to one of the core points. I think on present margins, voluntary restraint slows down capabilities progress by at most 5% while probably halving safety progress, and this doesn’t seem like a good trade. [The numbers seem like they were different in the past, but the counterfactuals here are hard to estimate.] I think if you measure by the number of people involved, the effect of restraint is substantially lower; here I’m assuming that people who are most interested in AI safety are probably most focused on the sorts of research directions that I think could be transformative, and so have an outsized impact.
There Should Be More Alignment-Driven Startups
Similarly for the Sierra Club, I think their transition from an anti-immigration org to a pro-immigration org seems like an interesting political turning point that could have failed to happen in another timeline.
From the outside, Finnish environmentalism seems unusually good—my first check for this is whether or not environmentalist groups are pro-nuclear, since (until recently) it was a good check for numeracy.
Note that the ‘conservation’ sorts of environmentalism are less partisan in the US, or at least, are becoming partisan later. (Here’s an article in 2016 about a recent change of a handful of Republicans opposed to national parks, in the face of bipartisan popular support for them.) I think the thing where climate change is a global problem instead of a local problem, and a conflict between academia and the oil industry, make it particularly prone to partisanship in the US. [Norway also has significant oil revenues—how partisan is their environmentalism, and do they have a similar detachment between conservation and climate change concerns?]
I think this is true of an environmentalist movement that wants there to be a healthy environment for humans; I’m not sure this is true of an environmentalist movement whose main goal is to dismantle capitalism. I don’t have a great sense of how this has changed over time (maybe the motivations for environmentalism are basically constant, and so it can’t explain the changes), but this feels like an important element of managing to maintain alliances with politicians in both parties.
(Thinking about the specifics, I think the world where Al Gore became a Republican (he was a moderate for much of his career) or simply wasn’t Clinton’s running mate (which he did in part because of HW Bush’s climate policies) maybe leads to less partisanship. I think that requires asking why those things happened, and whether there was any reasonable way for them to go the other way. The oil-republican link seems quite strong during the relevant timeframe, and you either need to have a strong oil-democrat link or somehow have a stronger climate-republican link, both of which seem hard.)
I get that this is the first post out of 4, and I’m skimming the report to see if you address this, but it sounds like you’re using historical data to try to prove a counterfactual claim. What alternative do you think was possible? (I assume the presence of realistic alternatives is what you mean by ‘not inevitable’, but maybe you mean something else.)
I think the main feature of AI transition that people around here missed / didn’t adequately foreground is that AI will be worse is better. AI art will be clearly worse than the best human art—maybe even median human art—but will cost pennies on the dollar, and so we will end up with more, worse art everywhere. (It’s like machine-made t-shirts compared to tailored clothes.) AI-enabled surveillance systems will likely look more like shallow understanding of all communication than a single overmind thinking hard about which humans are up to what trouble.
This was even hinted at by talking about human intelligence; this comment is from 2020, but I remember seeing this meme on LW much earlier:
When you think about it, because of the way evolution works, humans are probably hovering right around the bare-minimal level of rationality and intelligence needed to build and sustain civilization. Otherwise, civilization would have happened earlier, to our hominid ancestors.
Similarly, we should expect widespread AI integration at about the bare-minimum level of competence and profitability.
I often think of the MIRI view as focusing on the last AI; I.J. Good’s “last invention that man need ever make.” It seems quite plausible that those will be smarter than the smartest humans, but possibly in a way that we consider very boring. (The smartest calculators are smarter than the smartest humans at arithmetic.) Good uses the idea of ultraintelligence for its logical properties (it fits nicely into a syllogism) rather than its plausibility.
[Thinking about the last AI seems important because choices we make now will determine what state we’re in when we build the last AI, and aligning it is likely categorically different from aligning AI up to that point, so we need to get started now and try to develop in the right directions.]
A lot of this depends on where you draw the line between ‘rationality’ and ‘science’ or ‘economics’ and ‘philosophy’ or so on. As well, given that ‘rationality’ is doing the best you can given the constraints you’re under, it seems likely that many historical figures were ‘rational’ even if they weren’t clear precursors to the modern rationalist cluster.
For example, I think Xunzi (~3rd century BCE) definitely counts; check out Undoing Fixation in particular. [His students Li Si and Han Fei are also interesting in this regard, but I haven’t found something by them yet that makes them clearly stand out as rationalists. Also, like JenniferRM points out, they had a troubled legacy somewhat similar to Alexander’s.]
Some people count Mozi as the ‘first effective altruist’ in a way that seems similar.
People point to Francis Bacon as the originator of empiricism; you can read his main work here on LW. While influential in English-language thought, I think he is anticipated by al-Haytham and Ibn Sina.LaPlace is primarily famous as a mathematician and scientist, but I think he was important in the development of math underpinning modern rationality, and likely counts.
Benjamin Franklin seems relevant in a handful of ways; his autobiography is probably the best place to start reading.
Alfred Korzybski is almost exactly a hundred years older than Yudkowsky, and is the closest I’m aware of to rationality-as-it-is-now. You can see a discussion of sources between then and now in Rationalism Before The Sequences.
What would be a better framing?
I talk about something related in self and no-self; the outward-flowing ‘attempt to control’ and the inward-flowing ‘attempt to perceive’ are simultaneously in conflict (something being still makes it easier to see where it is, but also makes it harder to move it to where it should be) and mutually reinforcing (being able to tell where something is makes it easier to move it precisely where it needs to be).
Similarly, you can make an argument that control without understanding is impossible, that getting AI systems to do what we want is one task instead of two. I think I agree the “two progress bars” frame is incorrect but I think the typical AGI developer at a lab is not grappling with the philosophical problems behind alignment difficulties, and is trying to make something that ‘works at all’ instead of ‘works understandably’ in the sort of way that would actually lead to understanding which would enable control.
Spoiler-free Dune review, followed by spoilery thoughts: Dune part 1 was a great movie; Dune part 2 was a good movie. (The core strengths of the first movie were 1) fantastic art and 2) fidelity to the book; the second movie doesn’t have enough new art to carry its runtime and is stuck in a less interesting part of the plot, IMO, and one where the limitations of being a movie are more significant.)
Dune-the-book is about a lot of things, and I read it as a child, so it holds extra weight in my mind compared to other scifi that I came across when fully formed. One of the ways I feel sort-of-betrayed by Dune is that a lot of the things are fake or bad on purpose; the sandworms are biologically implausible; the ecology of Dune (one of the things it’s often lauded for!) is a cruel trick played on the Fremen (see if you can figure it out, or check the next spoiler block for why); the faith-based power of the Fremen warriors is a mirage; the Voice seems implausible; and so on.
The sandworms, the sole spice-factories in the universe (itself a crazy setting detail, but w/e), are killed by water, and so can only operate in deserts. In order to increase spice production, more of Dune has to be turned into a desert. How is that achieved? By having human caretakers of the planet who believe in a mercantilist approach to water—the more water you have locked away in reservoirs underground, the richer you are. As they accumulate water, the planet dries out, the deserts expand, and the process continues. And even if some enterprising smuggler decides to trade water for spice, the Fremen will just bury the water instead of using it to green the planet.
But anyway, one of the things that Dune-the-book got right is that a lot of the action is mental, and that a lot of what differentiates people is perceptual abilities. Some of those abilities are supernatural—the foresight enabled by spice being the main example—but are exaggerations of real abilities. It is possible to predict things about the world, and Dune depicts the predictions as, like, possibilities seen from a hill, with other hills and mountains blocking the view, in a way that seems pretty reminiscent of Monte Carlo tree search. This is very hard to translate to a movie! They don’t do any better a job of depicting Paul searching thru futures than Marvel did of Doctor Strange searching thru futures, and the climactic fight is a knife battle between a partial precog and a full precog, which is worse than the fistfight in Sherlock Holmes (2009).
And I think this had them cut one of my favorite things from the book, which was sort of load-bearing to the plot. Namely, Hasimir Fenring, a minor character who has a pivotal moment in the final showdown between Paul and the Emperor after being introduced earlier. (They just don’t have that moment.)
Why do do I think he’s so important? (For those who haven’t read the book recently, he’s the emperor’s friend, from one of the bloodlines the Bene Gesserit are cultivating for the Kwisatz Haderach, and the ‘mild-mannered accountant’ sort of assassin.)
The movie does successfully convey that the Bene Gesserit have options. Not everything is riding on Paul. They hint that Paul being there means that the others are close; Feyd talks about his visions, for example.
But I think there’s, like, a point maybe familiar from thinking about AI takeoff speeds / conquest risk, which is: when the first AGI shows up, how sophisticated will the rest of the system be? Will it be running on near-AGI software systems, or legacy systems that are easy to disrupt and replace?
In Dune, with regards to the Kwisatz Haderach, it’s near-AGI. Hasimir Fenring could kill Paul if he wanted to, even after Paul awakes as KH, even after Paul’s army beats the Sardaukar and he reaches the emperor! Paul gets this, Paul gets Hasimir’s lonely position and sterility, and Paul is empathetic towards him; Hasimir can sense Paul’s empathy and they have, like, an acausal bonding moment, and so Hasimir refuses the Emperor’s request to kill Paul. Paul is, in some shared sense, the son he couldn’t have and wanted to.
One of the other subtler things here is—why is Paul so constrained? The plot involves literal wormriding I think in part to be a metaphor for riding historical movements. Paul can get the worship of the Fremen—but they decide what that means, not him, and they decide it means holy war across the galaxy. Paul wishes it could be anything else, but doesn’t see how to change it. I think one of the things preventing him from changing it is the presence of other powerful opposition, where any attempt to soften his movement will be exploited.
Jumping back to a review of the movie (instead of just their choices about the story shared by movie and book), the way it handles the young skeptic vs. old believer Fremen dynamic seems… clumsy? Like “well, we’re making this movie in 2024, we have to cater to audience sensibilities”. Paul mansplains sandwalking to Chani, in a moment that seems totally out of place, and intended to reinforce the “this is a white guy where he doesn’t belong” narrative that clashes with the rest of the story. (Like, it only makes sense as him trolling his girlfriend, which I think is not what it’s supposed to be / how it’s supposed to be interpreted?) He insists that he’s there to learn from the Fremen / the planet is theirs, but whether this is a cynical bid for their loyalty or his true feeling is unclear. (Given him being sad about the holy war bit, you’d think that sadness might bleed over into what the Fremen want from him more generally.) Chani is generally opposed to viewing him as a prophet / his more power-seeking moves, and is hopefully intended as a sort of audience stand-in; rooting for Paul but worried about what he’s becoming. But the movie is about the events that make up Paul’s campaign against the Harkonnen, not the philosophy or how anyone feels about it at more than a surface level.
Relatedly, Paul blames Jessica for fanning the flames of fanaticism, but this doesn’t engage with that this is what works on them, or that it’s part of the overall narrow-path-thru. In general, Paul seems to do a lot of “being sad about doing the harmful thing, but not in a way that stops him from doing the harmful thing”, which… self-awareness is not an excuse?
I think open source AI development is bad for humanity, and think one of the good things about the OpenAI team is that they seem to have realized this (tho perhaps for the wrong reasons).
I am curious about the counterfactual where the original team had realized being open was a mistake from the beginning (let’s call that hypothetical project WindfallAI, or whatever, after their charter clause). Would Elon not have funded it? Would some founders (or early employees) have decided not to join?
It doesn’t present or consider any evidence for the alternatives.
So, in the current version of the post (which is edited from the original) Roko goes thru the basic estimate of “probability of this type of virus, location, and timing” given spillover and lab leak, and discounts other evidence in this paragraph:
These arguments are fairly robust to details about specific minor pieces of evidence or analyses. Whatever happens with all the minor arguments about enzymes and raccoon dogs and geospatial clustering, you still have to explain how the virus found its way to the place that got the first BSL-4 lab and the top Google hits for “Coronavirus China”, and did so in slightly less than 2 years after the lifting of the moratorium on gain-of-function research. And I don’t see how you can explain that other than that covid-19 escaped from WIV or a related facility in Wuhan.
I don’t think that counts as presenting it, but I do think that counts as considering it. I think it’s fine to question whether or not the arguments are robust to those details—I think they generally are and have not been impressed by any particular argument in favor of zoonosis that I’ve seen, mostly because I don’t think they properly estimate the probability under both hypotheses[1]--but I don’t think it’s the case that Roko is clearly making procedural errors here. [It seems to me like you’re arguing he’s making procedural errors instead of just combing to the wrong conclusion / using the wrong numbers, and so I’m focusing on that as the more important point.]
If it’s not a lot of evidence
This is what numbers are for. Is “1000-1” a lot? Is it tremendous? Who cares about fuzzy words when the number 1000 is right there. (I happen to think 1000-1 is a lot but is not tremendous.)
- ^
For example, the spatial clustering analysis suggests that the first major transmission event was at the market. But does their model explicitly consider both “transfer from animal to many humans at the market” and “transfer from infected lab worker to many humans at the market” and estimate probabilities for both? I don’t think so, and I think that means it’s not yet in a state where it can be plugged into the full Bayesian analysis. I think you need to multiply the probability that it was from the lab times the first lab-worker superspreader event happening at the market and compare that to the probability that it was from an animal times the first animal-human superspreader event happening at the market, and then you actually have some useful numbers to compare.
- ^
“I already tried this and it didn’t work.”
This post expresses a tremendous amount of certainty, and the mere fact that debate was stifled cannot possibly demonstrate that the stifled side is actually correct.
Agreed on the second half, and disagreed on the first. Looking at the version history, the first version of this post clearly identifies its core claims as Roko’s beliefs and as the lab as being the “likely” origin, and those sections seem unchanged to today. I don’t think that counts as tremendous certainty. Later, Roko estimates the difference in likelihoods between two hypotheses as being 1000:1, but this is really not a tremendous amount either.
What do you wish he had said instead of what he actually said?
It was terrible, and likely backfired, but that isn’t “the crime of the century” being referenced, that would be the millions of dead people.
As I clarify in a comment elsewhere, I think we should treat them as being roughly equally terrible. If we would execute someone for accidentally killing millions of people, I think we should also execute them for destroying evidence that they accidentally killed millions of people, even if it turns out they didn’t do it.
My weak guess is Roko is operating under a similar strategy and not being clear enough on the distinction the two halves of “they likely did it and definitely covered it up”. Like, the post title begins with “Brute Force Manufactured Consensus”, which he feels strongly about in this case because of the size of the underlying problem, but I think it’s also pretty clear he is highly opposed to the methodology.
I think it’s hard to evaluate the counterfactual where I made a blog earlier, but I think I always found the built-in audience of LessWrong significantly motivating, and never made my own blog in part because I could just post everything here. (There’s some stuff that ends up on my Tumblr or w/e instead of LW, even after ShortForm, but almost all of the nonfiction ended up here.)