One reasons that many rationalists have such strong views about AI is that they are wrong about epistemology. Specifically, bayesian rationalism is a bad way to think about complex issues.
A better approach is meta-rationality. To summarize one guiding principle of (my version of) meta-rationality in a single sentence: if something doesn’t make sense in the context of group rationality, it probably doesn’t make sense in the context of individual rationality either.
For example: there’s no privileged way to combine many people’s opinions into a single credence. You can average them, but that loses a lot of information. Or you can get them to bet on a prediction market, but that depends on a lot on details of the individuals’ betting strategies. The group might settle on a number to help with planning and communication, but it’s only a lossy summary of many different beliefs and models. Similarly, we should think of individuals’ credences as lossy summaries of different opinions from different underlying models that they have.
How does this apply to AI? Suppose we each think of ourselves as containing many different subagents that focus on understanding the world in different ways—e.g. studying different disciplines, using different styles of reasoning, etc. The subagent that thinks about AI from first principles might come to a very strong opinion. But this doesn’t mean that the other subagents should fully defer to it (just as having one very confident expert in a room of humans shouldn’t cause all the other humans to elect them as the dictator). E.g. maybe there’s an economics subagent who will remain skeptical unless the AI arguments can be formulated in ways that are consistent with their knowledge of economics, or the AI subagent can provide evidence that is legible even to those other subagents (e.g. advance predictions).
In my debate with Eliezer, he didn’t seem to appreciate the importance of advance predictions; I think the frame of “highly opinionated subagents should convince other subagents to trust them, rather than just seizing power” is an important aspect of what he’s missing. I think of rationalism as trying to form a single fully-consistent world-model; this has many of the same pitfalls as a country which tries to get everyone to agree on a single ideology. Even when that ideology is broadly correct, you’ll lose a bunch of useful heuristics and intuitions that help actually get stuff done, because ideological conformity is prioritized.
This perspective helps frame the debate about what our “base rate” for AI doom should be. I’ve been in a number of arguments that go roughly like (edited for clarity): Me: “Credences above 90% doom can’t be justified given our current state of knowledge” Them: “But this is an isolated demand for rigor, because you’re fine with people claiming that there’s a 90% chance we survive. You’re assuming that survival is the default, I’m assuming that doom is the default; these are symmetrical positions.” But in fact there’s no one base rate; instead, different subagents with different domains of knowledge will have different base rates. That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom. That’s where the asymmetry which makes 90% doom a much stronger prediction than 90% survival comes from.
This perspective is broadly aligned with a bunch of stuff that Scott Garrabrant and Abram Demski have written about (e.g. geometric rationality, Garrabrant induction). I don’t think the ways I’m applying it to AI risk debates straightforwardly falls out of their more technical ideas; but I do expect that more progress on agent foundations will make it easier to articulate ideas like the ones above.
That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom.
I don’t really buy this statement. Most frames, from most disciplines, and most styles of reasoning, do not make clear predictions about what will happen to humanity in the long-run future. A very few do, but the vast majority are silent on this issue. Silence is not anything like “50%”.
Most frames, from most disciplines, and most styles of reasoning, don’t predict sparks when you put metal in a microwave. This doesn’t mean I don’t know what happens when you put metal in a microwave. You need to at the very least limit yourself to applicable frames, and there are very few applicable frames for predicting humanity’s long-term future.
Unfortunately, I think there’s a fundamentally inside-view aspect of [problems very different from those we’re used to]. I think looking for a range of frames is the right thing to do—but deciding on the relevance of the frame can only be done by looking at the details of the problem itself (if we instead use our usual heuristics for relevance-of-frame-x, we run into the same out-of-distribution issues).
I don’t think there’s a way around this. Aspects of this situation are fundamentally different from those we’re used to. [Is different from] is not a useful relation—we can’t get far by saying “We’ve seen [fundamentally different] situations before—what happened there?”. It’ll all come back to how they were fundamentally different.
To say something mildly more constructive, I do still think we should be considering and evaluating other frames, based on our own inside-view model (with appropriate error bars on that model).
A place I’d start here would be:
Attempt to understand another frame.
See how far I need to zoom out before that frame’s models become a reasonable abstraction for the problem-as-I-understand-it.
Find the smallest changes to my models that’d allow me to stick with this frame without zooming out so far. Assess the probability that these adjusted models are correct/useful.
For most frames, I end up needing to zoom out too far for them to say much of relevance—so this doesn’t much change my p(doom) assessment.
It seems more useful to apply other frames to evaluate smaller parts of our models. I’m sure there are a bunch of places where intuitions and models from e.g. economics or physics do apply to safety-related subproblems.
I’ve been thinking lately that human group rationality seems like such a mess. Like how can humanity navigate a once in a lightcone opportunity like the AI transition without doing something very suboptimal (i.e., losing most of potential value), when the vast majority of humans (and even the elites) can’t understand (or can’t be convinced to pay attention to) many important considerations. This big picture seems intuitively very bad and I don’t know any theory of group rationality that says this is actually fine.
I guess my 1 is mostly about descriptive group rationality, and your 2 may be talking more about normative group rationality. However I’m also not aware of any good normative theories about group rationality. I started reading your meta-rationality sequence, but it ended after just two posts without going into details.
The only specific thing you mention here is “advance predictions” but for example, moral philosophy deals with “ought” questions and can’t provide advance predictions. Can you say more about how you think group rationality should work, especially when advance predictions isn’t possible?
From your group rationality perspective, why is it good that rationalists individually have better views about AI? Why shouldn’t each person just say what they think from their own preferred frame, and then let humanity integrate that into some kind of aggregate view or outcome, using group rationality?
I started reading your meta-rationality sequence, but it ended after just two posts without going into details.
David Chapman’s website seems like the standard reference for what the post-rationalists call “metarationality”. (I haven’t read much of it, but the little I read made me somewhat unenthusiastic about continuing).
How can the mistakes rationalists are making be expressed in the language of Bayesian rationalism? Priors, evidence, and posteriors are fundamental to how probability works.
The mistakes can (somewhat) be expressed in the language of Bayesian rationalism by doing two things:
Talking about partial hypotheses rather than full hypotheses. You can’t have a prior over partial hypotheses, because several of them can be true at once (though you can still assign them credences and update those credences according to evidence).
Talking about models with degrees of truth rather than just hypotheses with degrees of likelihood. E.g. when using a binary conception of truth, general relativity is definitely false because it’s inconsistent with quantum phenomena. Nevertheless, we want to say that it’s very close to the truth. In general this is more of an ML approach to epistemology (we want a set of models with low combined loss on the ground truth).
Suppose we think of ourselves as having many different subagents that focus on understanding the world in different ways—e.g. studying different disciplines, using different styles of reasoning, etc. The subagent that thinks about AI from first principles might come to a very strong opinion. But this doesn’t mean that the other subagents should fully defer to it (just as having one very confident expert in a room of humans shouldn’t cause all the other humans to elect them as the dictator). E.g. maybe there’s an economics subagent who will remain skeptical unless the AI arguments can be formulated in ways that are consistent with their knowledge of economics, or the AI subagent can provide evidence that is legible even to those other subagents (e.g. advance predictions).
Do “subagents” in this paragraph refer to different people, or different reasoning modes / perspectives within a single person? (I think it’s the latter, since otherwise they would just be “agents” rather than subagents.)
Either way, I think this is a neat way of modeling disagreement and reasoning processes, but for me it leads to a different conclusion on the object-level question of AI doom.
A big part of why I find Eliezer’s arguments about AI compelling is that they cohere with my own understanding of diverse subjects (economics, biology, engineering, philosophy, etc.) that are not directly related to AI—my subagents for these fields are convinced and in agreement.
Conversely, I find many of the strongest skeptical arguments about AI doom to be unconvincing precisely because they seem overly reliant on a “current-paradigm ML subagent” that their proponents feel should be dominant, or at least more heavily weighted than I think is justified.
That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom.
This might be true and useful for getting some kind of initial outside-view estimate, but I think you need some kind of weighting rule to make this work as reasoning strategy even at a meta level. Otherwise, aren’t you vulnerable to other people inventing lots of new frames and disciplines? I think the answer in geometric rationality terms is that some subagents will perform poorly and quickly lose their Nash bargaining resources, and then their contribution to future decision-making / conclusion-making will be down-weighted. But I don’t think the only way for a subagent to “perform” for the purposes of deciding on a weight is by making externally legible advance predictions.
I may be missing context here, but as written / taken at face value, I strongly agree with the above comment from Richard. I often disagree with Richard about alignment and its role in the future of AI, but this comment is an extremely dense list of things I agree with regarding rationalist epistemic culture.
I’d love to read an elaboration of your perspective on this, with concrete examples, which avoids focusing on the usual things you disagree about (pivotal acts vs. pivotal processes, social facets of the game is important for us to track, etc.) and mainly focus on your thoughts on epistemology and rationality and how it deviates from what you consider the LW norm.
My main take on Bayesian epistemology being wrong is that I think to the extent it’s useless in real life, it’s because it focuses way too much on the ideal case, ala @Robert Miles’s tweet here:
(The other problem I have with it is that even in the ideal case, it doesn’t have a way to sensibly handle 0 probability events, or conditioning on probability 0 events, which can actually happen once we leave the world of finite sets and measures.)
That said, I don’t think that people being wrong about epistemology is the cause of high p(Doom).
I’d agree more with @Algon in that the issues lie elsewhere (though a nitpick is that I wouldn’t say that EU maximization is wrong for TAI/AGI/ASI, but rather that certain dangerous properties don’t automatically hold, and that systems that EU maximize IRL like GPT-4 aren’t actually nearly as dangerous as often assumed. Agree with the other points.)
What I was talking about is that the predictive models like GPT-4 have a utility function that’s essentially predictive, and the maximization is essentially trying to update the best it can given input conditions.
These posts can help you to understand more about predictive/simulator utility functions like GPT-4:
The ideal predictor’s utility function is instead strictly over the model’s own outputs, conditional on inputs.
I’m doubtful that GPT-4 has a utility function. If it did, I would be kind-of terrified. I don’t think I’ve seen the posts you linked to though, so I’ll go read those.
Maybe a crux is that I’m willing to grant learned utility functions as utility functions, and I tend to see EU maximization/utility function reasoning in general as implying far less consequences than people on LW think it is, at least without more constraints.
It doesn’t try to assert it’s own existence, because that’s not necessary for maximizing updating/prediction output based on inputs.
I think the crux lies elsewhere, as I was sloppy in my wording. It’s not that maximizing some utility function is an issue, as basically anything can be viewed as EU maximization for a sufficiently wild utility function. However, I don’t view that as a meaningful utility function. Rather, it is the ones like e.g. utility functions over states that I think are meaningful, and those are scary. That’s how I think you get classical paperclip maximizers.
When I try and think up a meaningful utility function for GPT-4, I can’t find anything that’s plausible. Which means I don’t think there’s a meaningful prediction-utility function which describes GPT-4′s behaviour. Perhaps that is a crux.
Re utility functions over states, it turns out that we can validly turn utility functions over plans/predictions into utility functions over world states/outcomes (though usually with constraints on how large the domain is, though not always.)
And yeah, I think it’s a crux that I think that at the very least, what GPT-N systems will look like, if they reach AGI/ASI, will probably look like a maximizer for updating given input conditions like prompts.
My main point isn’t that the utility function framing of GPT-4 or GPT-N is wrong, but rather that LWers inferred way too much from how a system would behave, even conditional on expected utility maximization being a coherent frame for AIs, because they don’t logically imply the properties they thought it did without more assumptions that need to be defended.
What is the empirical track record of your suggested epistemological strategy, relative to Bayesian rationalism? Where does your confidence come from that it would work any better? Every time I see suggestions of epistemological humility, I think to myself stuff like this:
What predictions would this strategy have made about future technologies, like an 1890 or 1900 prediction of the airplane (vs. first controlled flight by the Wright Brothers in 1903), or a 1930 or 1937 prediction of nuclear bombs? Doesn’t your strategy just say that all these weird-sounding technologies don’t exist yet and are probably impossible?
Can this epistemological strategy correctly predict that present-day huge complex machines like airplanes can exist? They consist of millions of parts and require contributions of thousands or tens of thousand of people. Each part has a chance of being defective, and each person has a chance of making a mistake. Without the benefit of knowing that airplanes do indeed exist, doesn’t it sound overconfident to predict that parts have an error rate of <1 in a million, or that people have an error rate of <1 in a thousand? But then the math says that airplanes can’t exist, or should immediately crash.
Or to rephrase point 2 to reply to this part: “That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom.” — Can your epistemological strategy even correctly make any predictions of near 100% certainty? I concur with habryka that most frames don’t make any predictions on most things. And yet this doesn’t mean that some events aren’t ~100% certain.
But in fact there’s no one base rate; instead, different subagents with different domains of knowledge will have different base rates. That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom. That’s where the asymmetry which makes 90% a much stronger prediction than 10% comes from.
One of the most important features of future ASI I consider knowledge of limits of applicability of its models and heuristics. If you have list of assumptions for very fast heuristics, then you can win big by doing fast-computable moves in narrow environment where assumptions hold. Thus saying, you need to be able find when your assumptions don’t hold and command your subagents to halt, melt and catch fire when they are outside of their applicability zone.
I think this post doesn’t really explain why rats have high belief in doom, or why they’re wrong to do so. Perhaps ironically, there is a better a version of this post on both counts which isn’t so focused on how rats get epistemology wrong and the social/meta-level consequences. A post which focuses on the object-level implications for AI of a theory of rationality which looks very different from the AIXI-flavoured rat-orthodox view.
I say this because those sorts of considerations convinced me that we’re much less likely to be buggered. I.e. I no longer believe EU maximization is/will be a good description by default of TAI or widely economically productive AGI, mildly superhuman AGI or even ASI, depending on the details. Which is partly due to a recognition that the arguments for EU maximization are weaker than I thought, arguments for LDT being convergent are lacking, the notions of optimality we do have are very weak, the existence and behaviour of GPT-4, Claude Opus etc.
6 seems too general a claim to me. Why wouldn’t it work for 1% vs 10%, and likewise 0.1% vs 1% i.e. why doesn’t this suggest that you should round down P(doom) to zero. Also, I don’t even know what you mean by “most” here. Like, are we quantifying over methods of reasoning used by current AI researchers right now? Over all time? Over all AI researchers and engineers? Over everyone in the West? Over everyone who’s ever lived? Etc.
And it seems to me like you’re implicitly privileging ways of combining these opinions that get you 10% instead of 1% or 90%, which is begging the question. Of course, you could reply that a P(doom) of 10% is confused, that isn’t really your state of knowledge, lumping in all your sub-agents models into a single number is too lossy etc. But then why mention that 90% is a much stronger prediction than 10% instead of saying they’re roughly equally confused?
7 I kinda disagree with. Those models of idealized reasoning you mention generalize Bayesianism/Expected Utility Maximization. But they are not far from the Bayesian framework or EU frameworks. Like Bayesianism, they do say there are correct and incorrect ways of combining beliefs, that beliefs should be isomorphic to certain structures, unless I’m horribly mistaken. Which sure is not what you’re claiming to be the case in your above points.
Also, a lot of rationalists already recognize that these models are addressing flaws in Bayesianism like logical omniscience, embeddedness etc. Like, I believed this at least around 2017, and probably earlier. Also, note that these models of epistemology are not in tension with a strong belief that we’re buggered. Last I checked, the people who invented these models believe we’re buggered. I think they may imply that we’re a little less than the EU maximization theory though, but I don’t think this is a big difference. IMO this is not a big enough departure to do the work that your post requires.
A post which focuses on the object-level implications for AI of a theory of rationality which looks very different from the AIXI-flavoured rat-orthodox view.
I’m working on this right now, actually. Will hopefully post in a couple of weeks.
I say this because those sorts of considerations convinced me that we’re much less likely to be buggered.
That seems reasonable. But I do think there’s a group of people who have internalized bayesian rationalism enough that the main blocker is their general epistemology, rather than the way they reason about AI in particular.
6 seems too general a claim to me. Why wouldn’t it work for 1% vs 10%, and likewise 0.1% vs 1% i.e. why doesn’t this suggest that you should round down P(doom) to zero.
I think the point of 6 is not to say “here’s where you should end up”, but more to say “here’s the reason why this straightforward symmetry argument doesn’t hold”.
7 I kinda disagree with. Those models of idealized reasoning you mention generalize Bayesianism/Expected Utility Maximization. But they are not far from the Bayesian framework or EU frameworks.
There’s still something importantly true about EU maximization and bayesianism. I think the changes we need will be subtle but have far-reaching ramifications. Analogously, relativity was a subtle change to newtonian mechanics that had far-reaching implications for how to think about reality.
Like Bayesianism, they do say there are correct and incorrect ways of combining beliefs, that beliefs should be isomorphic to certain structures, unless I’m horribly mistaken. Which sure is not what you’re claiming to be the case in your above points.
Any epistemology will rule out some updates, but a problem with bayesianism is that it says there’s one correct update to make. Whereas radical probabilism, for example, still sets some constraints, just far fewer.
I’m working on this right now, actually. Will hopefully post in a couple of weeks.
This sounds cool.
That seems reasonable. But I do think there’s a group of people who have internalized bayesian rationalism enough that the main blocker is their general epistemology, rather than the way they reason about AI in particular.
I think your OP didn’t give enough details as to why internalizing Bayesian rationalism leads to doominess by default. Like, Nora Belrose is firmly Bayesian and is decidedly an optimist. Admittedly, I think she doesn’t think a Kolmogorov prior is a good one, but I don’t think that makes you much more doomy either. I think Jacob Cannel and others are also Bayesian and non-doomy. Perhaps I’m using “Bayesian rationalism” differently than you are, which is why I think your claim, as I read it, is invalid.
I think the point of 6 is not to say “here’s where you should end up”, but more to say “here’s the reason why this straightforward symmetry argument doesn’t hold”.
Fair enough. However, how big is the asymmetry? I’m a bit sceptical there is a large one. Based off my interactions, it seems like ~ everyone who has seriously thought about this topic for a couple of hours has radically different models, w/ radically different levels of doominess. This holds even amongst people who share many lenses (e.g. Tyler Cowen vs Robin Hanson, Paul Christiano vs. Scott Aaronson, Steve Hsu vs Michael Nielsen etc.).
There’s still something importantly true about EU maximization and bayesianism. I think the changes we need will be subtle but have far-reaching ramifications. Analogously, relativity was a subtle change to newtonian mechanics that had far-reaching implications for how to think about reality.
I think we’re in agreement over this. (I think Bayesianism less wrong than EU maximization, and probably a very good approximation in lots of places, like Newtonian physics is for GR.) But my contention is over Bayesian epistemology tripping many rats up when thinking about AI x-risk. You need some story which explains why sticking to Bayesian epistemology is tripping up very many people here in particular.
Any epistemology will rule out some updates, but a problem with bayesianism is that it says there’s one correct update to make. Whereas radical probabilism, for example, still sets some constraints, just far fewer.
Right, but in radical probabilism the type of beliefs is still a real valued function, no? Which is in tension w/ many disparate models that don’t get compressed down to a single number. In that sense, the refined formalism is still rigid in a way that your description is flexible. And I suspect the same is true for Infra-Bayesianism, though I understand that even less well than radical probabilism.
I think you’re making a good point (rationalists maybe don’t weight other opinions highly enough), but you’d get farther framing it as an update to how to use Bayesian reasoning, rather than an alternative. Bayesian reasoning has a pretty strong intuitive connection to “the factually correct way to reason”, even though there’s a ton of subtlety in that statement and how and where it’s applied.
WRT to many of your arguments: base rates are increasingly just the wrong way to reason about AGI risks. We can think in more detail about how we’ll build AGI and what the risks are.
I think they are just using that as an example of a strongly opinionated sub-agent which may be one of many different and highly specific probability assessments of doom.
As for “survival is the default assumption”—what a declaration of that implies on the surface level is that the chance of survival is overwhelming except in the case of a cataclysmic AI scenario. To put it another way: we have a 99% chance of survival so long as we get AGI right.
To put it yet another way—Hollywood has made popular films about the human world being destroyed by Nuclear War, Climate Change, Viral Pandemic, and Asteroid Impact to name a few—different sub-agents could each give higher or lower probabilities to each of those scenarios depending on things like domain knowledge and in concert it raises the question of why we presume that survival is the default? What is the ensemble average of doom?
Is doom more or less likely than survival for any given time frame?
Some opinions about AI and epistemology:
One reasons that many rationalists have such strong views about AI is that they are wrong about epistemology. Specifically, bayesian rationalism is a bad way to think about complex issues.
A better approach is meta-rationality. To summarize one guiding principle of (my version of) meta-rationality in a single sentence: if something doesn’t make sense in the context of group rationality, it probably doesn’t make sense in the context of individual rationality either.
For example: there’s no privileged way to combine many people’s opinions into a single credence. You can average them, but that loses a lot of information. Or you can get them to bet on a prediction market, but that depends on a lot on details of the individuals’ betting strategies. The group might settle on a number to help with planning and communication, but it’s only a lossy summary of many different beliefs and models. Similarly, we should think of individuals’ credences as lossy summaries of different opinions from different underlying models that they have.
How does this apply to AI? Suppose we each think of ourselves as containing many different subagents that focus on understanding the world in different ways—e.g. studying different disciplines, using different styles of reasoning, etc. The subagent that thinks about AI from first principles might come to a very strong opinion. But this doesn’t mean that the other subagents should fully defer to it (just as having one very confident expert in a room of humans shouldn’t cause all the other humans to elect them as the dictator). E.g. maybe there’s an economics subagent who will remain skeptical unless the AI arguments can be formulated in ways that are consistent with their knowledge of economics, or the AI subagent can provide evidence that is legible even to those other subagents (e.g. advance predictions).
In my debate with Eliezer, he didn’t seem to appreciate the importance of advance predictions; I think the frame of “highly opinionated subagents should convince other subagents to trust them, rather than just seizing power” is an important aspect of what he’s missing. I think of rationalism as trying to form a single fully-consistent world-model; this has many of the same pitfalls as a country which tries to get everyone to agree on a single ideology. Even when that ideology is broadly correct, you’ll lose a bunch of useful heuristics and intuitions that help actually get stuff done, because ideological conformity is prioritized.
This perspective helps frame the debate about what our “base rate” for AI doom should be. I’ve been in a number of arguments that go roughly like (edited for clarity):
Me: “Credences above 90% doom can’t be justified given our current state of knowledge”
Them: “But this is an isolated demand for rigor, because you’re fine with people claiming that there’s a 90% chance we survive. You’re assuming that survival is the default, I’m assuming that doom is the default; these are symmetrical positions.”
But in fact there’s no one base rate; instead, different subagents with different domains of knowledge will have different base rates. That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom. That’s where the asymmetry which makes 90% doom a much stronger prediction than 90% survival comes from.
This perspective is broadly aligned with a bunch of stuff that Scott Garrabrant and Abram Demski have written about (e.g. geometric rationality, Garrabrant induction). I don’t think the ways I’m applying it to AI risk debates straightforwardly falls out of their more technical ideas; but I do expect that more progress on agent foundations will make it easier to articulate ideas like the ones above.
I don’t really buy this statement. Most frames, from most disciplines, and most styles of reasoning, do not make clear predictions about what will happen to humanity in the long-run future. A very few do, but the vast majority are silent on this issue. Silence is not anything like “50%”.
Most frames, from most disciplines, and most styles of reasoning, don’t predict sparks when you put metal in a microwave. This doesn’t mean I don’t know what happens when you put metal in a microwave. You need to at the very least limit yourself to applicable frames, and there are very few applicable frames for predicting humanity’s long-term future.
I agree with this.
Unfortunately, I think there’s a fundamentally inside-view aspect of [problems very different from those we’re used to]. I think looking for a range of frames is the right thing to do—but deciding on the relevance of the frame can only be done by looking at the details of the problem itself (if we instead use our usual heuristics for relevance-of-frame-x, we run into the same out-of-distribution issues).
I don’t think there’s a way around this. Aspects of this situation are fundamentally different from those we’re used to. [Is different from] is not a useful relation—we can’t get far by saying “We’ve seen [fundamentally different] situations before—what happened there?”. It’ll all come back to how they were fundamentally different.
To say something mildly more constructive, I do still think we should be considering and evaluating other frames, based on our own inside-view model (with appropriate error bars on that model).
A place I’d start here would be:
Attempt to understand another frame.
See how far I need to zoom out before that frame’s models become a reasonable abstraction for the problem-as-I-understand-it.
Find the smallest changes to my models that’d allow me to stick with this frame without zooming out so far. Assess the probability that these adjusted models are correct/useful.
For most frames, I end up needing to zoom out too far for them to say much of relevance—so this doesn’t much change my p(doom) assessment.
It seems more useful to apply other frames to evaluate smaller parts of our models. I’m sure there are a bunch of places where intuitions and models from e.g. economics or physics do apply to safety-related subproblems.
I’ve been thinking lately that human group rationality seems like such a mess. Like how can humanity navigate a once in a lightcone opportunity like the AI transition without doing something very suboptimal (i.e., losing most of potential value), when the vast majority of humans (and even the elites) can’t understand (or can’t be convinced to pay attention to) many important considerations. This big picture seems intuitively very bad and I don’t know any theory of group rationality that says this is actually fine.
I guess my 1 is mostly about descriptive group rationality, and your 2 may be talking more about normative group rationality. However I’m also not aware of any good normative theories about group rationality. I started reading your meta-rationality sequence, but it ended after just two posts without going into details.
The only specific thing you mention here is “advance predictions” but for example, moral philosophy deals with “ought” questions and can’t provide advance predictions. Can you say more about how you think group rationality should work, especially when advance predictions isn’t possible?
From your group rationality perspective, why is it good that rationalists individually have better views about AI? Why shouldn’t each person just say what they think from their own preferred frame, and then let humanity integrate that into some kind of aggregate view or outcome, using group rationality?
David Chapman’s website seems like the standard reference for what the post-rationalists call “metarationality”. (I haven’t read much of it, but the little I read made me somewhat unenthusiastic about continuing).
How can the mistakes rationalists are making be expressed in the language of Bayesian rationalism? Priors, evidence, and posteriors are fundamental to how probability works.
The mistakes can (somewhat) be expressed in the language of Bayesian rationalism by doing two things:
Talking about partial hypotheses rather than full hypotheses. You can’t have a prior over partial hypotheses, because several of them can be true at once (though you can still assign them credences and update those credences according to evidence).
Talking about models with degrees of truth rather than just hypotheses with degrees of likelihood. E.g. when using a binary conception of truth, general relativity is definitely false because it’s inconsistent with quantum phenomena. Nevertheless, we want to say that it’s very close to the truth. In general this is more of an ML approach to epistemology (we want a set of models with low combined loss on the ground truth).
Do “subagents” in this paragraph refer to different people, or different reasoning modes / perspectives within a single person? (I think it’s the latter, since otherwise they would just be “agents” rather than subagents.)
Either way, I think this is a neat way of modeling disagreement and reasoning processes, but for me it leads to a different conclusion on the object-level question of AI doom.
A big part of why I find Eliezer’s arguments about AI compelling is that they cohere with my own understanding of diverse subjects (economics, biology, engineering, philosophy, etc.) that are not directly related to AI—my subagents for these fields are convinced and in agreement.
Conversely, I find many of the strongest skeptical arguments about AI doom to be unconvincing precisely because they seem overly reliant on a “current-paradigm ML subagent” that their proponents feel should be dominant, or at least more heavily weighted than I think is justified.
This might be true and useful for getting some kind of initial outside-view estimate, but I think you need some kind of weighting rule to make this work as reasoning strategy even at a meta level. Otherwise, aren’t you vulnerable to other people inventing lots of new frames and disciplines? I think the answer in geometric rationality terms is that some subagents will perform poorly and quickly lose their Nash bargaining resources, and then their contribution to future decision-making / conclusion-making will be down-weighted. But I don’t think the only way for a subagent to “perform” for the purposes of deciding on a weight is by making externally legible advance predictions.
I may be missing context here, but as written / taken at face value, I strongly agree with the above comment from Richard. I often disagree with Richard about alignment and its role in the future of AI, but this comment is an extremely dense list of things I agree with regarding rationalist epistemic culture.
I’d love to read an elaboration of your perspective on this, with concrete examples, which avoids focusing on the usual things you disagree about (pivotal acts vs. pivotal processes, social facets of the game is important for us to track, etc.) and mainly focus on your thoughts on epistemology and rationality and how it deviates from what you consider the LW norm.
My main take on Bayesian epistemology being wrong is that I think to the extent it’s useless in real life, it’s because it focuses way too much on the ideal case, ala @Robert Miles’s tweet here:
https://x.com/robertskmiles/status/1830925270066286950
(The other problem I have with it is that even in the ideal case, it doesn’t have a way to sensibly handle 0 probability events, or conditioning on probability 0 events, which can actually happen once we leave the world of finite sets and measures.)
That said, I don’t think that people being wrong about epistemology is the cause of high p(Doom).
I’d agree more with @Algon in that the issues lie elsewhere (though a nitpick is that I wouldn’t say that EU maximization is wrong for TAI/AGI/ASI, but rather that certain dangerous properties don’t automatically hold, and that systems that EU maximize IRL like GPT-4 aren’t actually nearly as dangerous as often assumed. Agree with the other points.)
(I am not the iniminatable @Robert Miles, though we do have some things in common.)
Reply to @Algon:
What I was talking about is that the predictive models like GPT-4 have a utility function that’s essentially predictive, and the maximization is essentially trying to update the best it can given input conditions.
These posts can help you to understand more about predictive/simulator utility functions like GPT-4:
https://www.lesswrong.com/posts/vs49tuFuaMEd4iskA/one-path-to-coherence-conditionalization
https://www.lesswrong.com/posts/k48vB92mjE9Z28C3s/implied-utilities-of-simulators-are-broad-dense-and-shallow
https://www.lesswrong.com/posts/EBKJq2gkhvdMg5nTQ/instrumentality-makes-agents-agenty
I’m doubtful that GPT-4 has a utility function. If it did, I would be kind-of terrified. I don’t think I’ve seen the posts you linked to though, so I’ll go read those.
Maybe a crux is that I’m willing to grant learned utility functions as utility functions, and I tend to see EU maximization/utility function reasoning in general as implying far less consequences than people on LW think it is, at least without more constraints.
It doesn’t try to assert it’s own existence, because that’s not necessary for maximizing updating/prediction output based on inputs.
I think the crux lies elsewhere, as I was sloppy in my wording. It’s not that maximizing some utility function is an issue, as basically anything can be viewed as EU maximization for a sufficiently wild utility function. However, I don’t view that as a meaningful utility function. Rather, it is the ones like e.g. utility functions over states that I think are meaningful, and those are scary. That’s how I think you get classical paperclip maximizers.
When I try and think up a meaningful utility function for GPT-4, I can’t find anything that’s plausible. Which means I don’t think there’s a meaningful prediction-utility function which describes GPT-4′s behaviour. Perhaps that is a crux.
Re utility functions over states, it turns out that we can validly turn utility functions over plans/predictions into utility functions over world states/outcomes (though usually with constraints on how large the domain is, though not always.)
https://www.lesswrong.com/posts/k48vB92mjE9Z28C3s/?commentId=QciMJ9ehR9xbTexcc
And yeah, I think it’s a crux that I think that at the very least, what GPT-N systems will look like, if they reach AGI/ASI, will probably look like a maximizer for updating given input conditions like prompts.
My main point isn’t that the utility function framing of GPT-4 or GPT-N is wrong, but rather that LWers inferred way too much from how a system would behave, even conditional on expected utility maximization being a coherent frame for AIs, because they don’t logically imply the properties they thought it did without more assumptions that need to be defended.
What is the empirical track record of your suggested epistemological strategy, relative to Bayesian rationalism? Where does your confidence come from that it would work any better? Every time I see suggestions of epistemological humility, I think to myself stuff like this:
What predictions would this strategy have made about future technologies, like an 1890 or 1900 prediction of the airplane (vs. first controlled flight by the Wright Brothers in 1903), or a 1930 or 1937 prediction of nuclear bombs? Doesn’t your strategy just say that all these weird-sounding technologies don’t exist yet and are probably impossible?
Can this epistemological strategy correctly predict that present-day huge complex machines like airplanes can exist? They consist of millions of parts and require contributions of thousands or tens of thousand of people. Each part has a chance of being defective, and each person has a chance of making a mistake. Without the benefit of knowing that airplanes do indeed exist, doesn’t it sound overconfident to predict that parts have an error rate of <1 in a million, or that people have an error rate of <1 in a thousand? But then the math says that airplanes can’t exist, or should immediately crash.
Or to rephrase point 2 to reply to this part: “That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom.” — Can your epistemological strategy even correctly make any predictions of near 100% certainty? I concur with habryka that most frames don’t make any predictions on most things. And yet this doesn’t mean that some events aren’t ~100% certain.
One of the most important features of future ASI I consider knowledge of limits of applicability of its models and heuristics. If you have list of assumptions for very fast heuristics, then you can win big by doing fast-computable moves in narrow environment where assumptions hold. Thus saying, you need to be able find when your assumptions don’t hold and command your subagents to halt, melt and catch fire when they are outside of their applicability zone.
I think this post doesn’t really explain why rats have high belief in doom, or why they’re wrong to do so. Perhaps ironically, there is a better a version of this post on both counts which isn’t so focused on how rats get epistemology wrong and the social/meta-level consequences. A post which focuses on the object-level implications for AI of a theory of rationality which looks very different from the AIXI-flavoured rat-orthodox view.
I say this because those sorts of considerations convinced me that we’re much less likely to be buggered. I.e. I no longer believe EU maximization is/will be a good description by default of TAI or widely economically productive AGI, mildly superhuman AGI or even ASI, depending on the details. Which is partly due to a recognition that the arguments for EU maximization are weaker than I thought, arguments for LDT being convergent are lacking, the notions of optimality we do have are very weak, the existence and behaviour of GPT-4, Claude Opus etc.
6 seems too general a claim to me. Why wouldn’t it work for 1% vs 10%, and likewise 0.1% vs 1% i.e. why doesn’t this suggest that you should round down P(doom) to zero. Also, I don’t even know what you mean by “most” here. Like, are we quantifying over methods of reasoning used by current AI researchers right now? Over all time? Over all AI researchers and engineers? Over everyone in the West? Over everyone who’s ever lived? Etc.
And it seems to me like you’re implicitly privileging ways of combining these opinions that get you 10% instead of 1% or 90%, which is begging the question. Of course, you could reply that a P(doom) of 10% is confused, that isn’t really your state of knowledge, lumping in all your sub-agents models into a single number is too lossy etc. But then why mention that 90% is a much stronger prediction than 10% instead of saying they’re roughly equally confused?
7 I kinda disagree with. Those models of idealized reasoning you mention generalize Bayesianism/Expected Utility Maximization. But they are not far from the Bayesian framework or EU frameworks. Like Bayesianism, they do say there are correct and incorrect ways of combining beliefs, that beliefs should be isomorphic to certain structures, unless I’m horribly mistaken. Which sure is not what you’re claiming to be the case in your above points.
Also, a lot of rationalists already recognize that these models are addressing flaws in Bayesianism like logical omniscience, embeddedness etc. Like, I believed this at least around 2017, and probably earlier. Also, note that these models of epistemology are not in tension with a strong belief that we’re buggered. Last I checked, the people who invented these models believe we’re buggered. I think they may imply that we’re a little less than the EU maximization theory though, but I don’t think this is a big difference. IMO this is not a big enough departure to do the work that your post requires.
Thanks for the reply.
I’m working on this right now, actually. Will hopefully post in a couple of weeks.
That seems reasonable. But I do think there’s a group of people who have internalized bayesian rationalism enough that the main blocker is their general epistemology, rather than the way they reason about AI in particular.
I think the point of 6 is not to say “here’s where you should end up”, but more to say “here’s the reason why this straightforward symmetry argument doesn’t hold”.
There’s still something importantly true about EU maximization and bayesianism. I think the changes we need will be subtle but have far-reaching ramifications. Analogously, relativity was a subtle change to newtonian mechanics that had far-reaching implications for how to think about reality.
Any epistemology will rule out some updates, but a problem with bayesianism is that it says there’s one correct update to make. Whereas radical probabilism, for example, still sets some constraints, just far fewer.
This sounds cool.
I think your OP didn’t give enough details as to why internalizing Bayesian rationalism leads to doominess by default. Like, Nora Belrose is firmly Bayesian and is decidedly an optimist. Admittedly, I think she doesn’t think a Kolmogorov prior is a good one, but I don’t think that makes you much more doomy either. I think Jacob Cannel and others are also Bayesian and non-doomy. Perhaps I’m using “Bayesian rationalism” differently than you are, which is why I think your claim, as I read it, is invalid.
Fair enough. However, how big is the asymmetry? I’m a bit sceptical there is a large one. Based off my interactions, it seems like ~ everyone who has seriously thought about this topic for a couple of hours has radically different models, w/ radically different levels of doominess. This holds even amongst people who share many lenses (e.g. Tyler Cowen vs Robin Hanson, Paul Christiano vs. Scott Aaronson, Steve Hsu vs Michael Nielsen etc.).
I think we’re in agreement over this. (I think Bayesianism less wrong than EU maximization, and probably a very good approximation in lots of places, like Newtonian physics is for GR.) But my contention is over Bayesian epistemology tripping many rats up when thinking about AI x-risk. You need some story which explains why sticking to Bayesian epistemology is tripping up very many people here in particular.
Right, but in radical probabilism the type of beliefs is still a real valued function, no? Which is in tension w/ many disparate models that don’t get compressed down to a single number. In that sense, the refined formalism is still rigid in a way that your description is flexible. And I suspect the same is true for Infra-Bayesianism, though I understand that even less well than radical probabilism.
I think you’re making a good point (rationalists maybe don’t weight other opinions highly enough), but you’d get farther framing it as an update to how to use Bayesian reasoning, rather than an alternative. Bayesian reasoning has a pretty strong intuitive connection to “the factually correct way to reason”, even though there’s a ton of subtlety in that statement and how and where it’s applied.
WRT to many of your arguments: base rates are increasingly just the wrong way to reason about AGI risks. We can think in more detail about how we’ll build AGI and what the risks are.
Am I misunderstandng this sentence? How do “90% doom” and the assumption that survival is the default square with one another?
Edited for clarity now.
I think they are just using that as an example of a strongly opinionated sub-agent which may be one of many different and highly specific probability assessments of doom.
As for “survival is the default assumption”—what a declaration of that implies on the surface level is that the chance of survival is overwhelming except in the case of a cataclysmic AI scenario. To put it another way:
we have a 99% chance of survival so long as we get AGI right.
To put it yet another way—Hollywood has made popular films about the human world being destroyed by Nuclear War, Climate Change, Viral Pandemic, and Asteroid Impact to name a few—different sub-agents could each give higher or lower probabilities to each of those scenarios depending on things like domain knowledge and in concert it raises the question of why we presume that survival is the default? What is the ensemble average of doom?
Is doom more or less likely than survival for any given time frame?