(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)
This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.
(I agree with 1, somewhat agree with 2, and don’t agree with 3).
It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?
My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).
I guess my position is something like this. I think it may be quite possible to make capabilities “blindly”—basically the processing-power heavy type of AI progress (applying enough tricks so you’re not literally recapitulating evolution, but you’re sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.
So I believe in some kind of knowledge to be had (ie, point #1).
Yeah, so, taking stock of the discussion again, it seems like:
There’s a thing-I-believe-which-is-kind-of-like-rationality-realism.
Points 1 and 2 together seem more in line with that thing than “rationality realism” as I understood it from the OP.
You already believe #1, and somewhat believe #2.
We are both pessimistic about #3, but I’m so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
We probably do have some disagreement about something like “how real is rationality?”—but I continue to strongly suspect it isn’t that cruxy.
(ETA: In my head I was replacing “evolution” with “reproductive fitness”; I don’t agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don’t know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)
I checked whether I thought the analogy was right with “reproductive fitness” and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
Sorry it resulted in a confusing mixed metaphor overall.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a “real” thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover “real” things that would then be important, but I don’t think that’s the claim.)
I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it’s all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.
Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn’t understand how organisms seeded on those planets would likely evolve.)
So—it seems to me—the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!
My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
I think we disagree primarily on 2 (and also how doomy the default case is, but let’s set that aside).
In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
I think that’s a crux between you and me. I’m no longer sure if it’s a crux between you and Richard. (ETA: I shouldn’t call this a crux, I wouldn’t change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the “unreal rationality” world to be similar to what Daniel mentions below:
I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
Yeah, I’m going to try to give a different explanation that doesn’t involve “realness”.
When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are “levers”, “gears”, “nails”, etc.
A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write “x + y”, I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don’t have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don’t have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don’t need to communicate all the caveats and intuitions that would accompany a leaky abstraction.
One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.
It’s fine if there’s some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness—if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)
If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can’t build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can’t generalize it to something more far-off. Some examples from these comment threads of what “inferences about directly related things” looks like:
current theories about why England had an industrial revolution when it did
[biology] has far more practical consequences (thinking of medicine)
understanding why overuse of antibiotics might weaken the effect of antibiotics [based on knowledge of evolution]
Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say “overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic”.
In contrast, for abstractions like “logic gates”, “assembly language”, “levers”, etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you’d be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.
So now I’d go back and state our crux as:
Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?
I would guess not. It sounds like you would guess yes.
I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about “directly relevant things”, which will probably let you say some interesting things about AI systems, just not very much. I’d expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that’s precise enough to build hierarchies with.
(I think I’d also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)
(You might wonder why I’m optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is “directly relevant” to existing ML systems, and so you don’t need to build hierarchies of abstraction—just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
Your few assumptions need to talk about the system you actually build. On the model I’m outlining, it’s hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.
This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.
It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?
I guess my position is something like this. I think it may be quite possible to make capabilities “blindly”—basically the processing-power heavy type of AI progress (applying enough tricks so you’re not literally recapitulating evolution, but you’re sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.
So I believe in some kind of knowledge to be had (ie, point #1).
Yeah, so, taking stock of the discussion again, it seems like:
There’s a thing-I-believe-which-is-kind-of-like-rationality-realism.
Points 1 and 2 together seem more in line with that thing than “rationality realism” as I understood it from the OP.
You already believe #1, and somewhat believe #2.
We are both pessimistic about #3, but I’m so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
We probably do have some disagreement about something like “how real is rationality?”—but I continue to strongly suspect it isn’t that cruxy.
I checked whether I thought the analogy was right with “reproductive fitness” and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
Sorry it resulted in a confusing mixed metaphor overall.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it’s all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.
Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn’t understand how organisms seeded on those planets would likely evolve.)
So—it seems to me—the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
I think we disagree primarily on 2 (and also how doomy the default case is, but let’s set that aside).
I think that’s a crux between you and me. I’m no longer sure if it’s a crux between you and Richard. (ETA: I shouldn’t call this a crux, I wouldn’t change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)
Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the “unreal rationality” world to be similar to what Daniel mentions below:
Yeah, I’m going to try to give a different explanation that doesn’t involve “realness”.
When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are “levers”, “gears”, “nails”, etc.
A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write “x + y”, I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don’t have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don’t have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don’t need to communicate all the caveats and intuitions that would accompany a leaky abstraction.
One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.
It’s fine if there’s some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness—if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)
If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can’t build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can’t generalize it to something more far-off. Some examples from these comment threads of what “inferences about directly related things” looks like:
Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say “overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic”.
In contrast, for abstractions like “logic gates”, “assembly language”, “levers”, etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you’d be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.
So now I’d go back and state our crux as:
I would guess not. It sounds like you would guess yes.
I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about “directly relevant things”, which will probably let you say some interesting things about AI systems, just not very much. I’d expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that’s precise enough to build hierarchies with.
(I think I’d also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)
(You might wonder why I’m optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is “directly relevant” to existing ML systems, and so you don’t need to build hierarchies of abstraction—just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)
Your few assumptions need to talk about the system you actually build. On the model I’m outlining, it’s hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.
I generally like the re-framing here, and agree with the proposed crux.
I may try to reply more at the object level later.
Abram, did you reply to that crux somewhere?