I didn’t like this post. At the time, I didn’t engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn’t actually engage with the idea very much. So it seems like a good idea to say something now.
The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don’t think it’s my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it’s a straw-man of the view it’s trying to point at.
The main problem is the word “realism”. It isn’t clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.
I agree that there’s something kind of like rationality realism. I just don’t think this post successfully points at it.
Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.
So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it’s more like fitness.
It seems to me like my position, and the MIRI-cluster position, is (1) closer to “rationality is like fitness” than “rationality is like momentum”, and (2) doesn’t depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy—evolutionary biologists still see fitness as a very important subject, and don’t seem to care that much about exactly how real the abstraction is.)
To the extent that this post has made a lot of people think that rationality realism is an important crux, it’s quite plausible to me that it’s made the discussion worse.
To expand more on (1) -- since it seems a lot of people found its negation plausible—it seems like if there’s an analogue for the theory of evolution, which uses relatively unreal concepts like “fitness” to help us understand rational agency, we’d like to know about it. In this view, MIRI-cluster is essentially saying “biologists should want to invent evolution. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
ETA: The original version of this comment conflated “evolution” and “reproductive fitness”, I’ve updated it now (see also my reply to Ben Pace’s comment).
Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality.
MIRI in general and you in particular seem unusually (to me) confident that:
1. We can learn more than we already know about rationality of “ideal” agents (or perhaps arbitrary agents?).
2. This understanding will allow us to build AI systems that we understand better than the ones we build today.
3. We will be able to do this in time for it to affect real AI systems. (This could be either because it is unusually tractable and can be solved very quickly, or because timelines are very long.)
This is primarily based on what research you and MIRI do, some of MIRI’s strategy writing, writing like the Rocket Alignment problem and law thinking, and an assumption that you are choosing to do this research because you think it is an effective way to reduce AI risk (given your skills).
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)
My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy). I’d be interested in an argument for the three points listed above without realism about rationality (I agree with 1, somewhat agree with 2, and don’t agree with 3).
If you don’t have realism about rationality, then I basically agree with this sentence, though I’d rephrase it:
MIRI-cluster is essentially saying “biologists should want to invent evolution. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
(ETA: In my head I was replacing “evolution” with “reproductive fitness”; I don’t agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don’t know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)
To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution (ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a “real” thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover “real” things that would then be important, but I don’t think that’s the claim.) My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
So I’d rephrase the sentence as: (ETA: changed the sentence a bit to talk about fitness instead of evolution)
MIRI-cluster is essentially saying “biologists should want to understand reproductive fitness. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “Yeah, it’s a fascinating question to understand what makes animals fit, but given that we want to understand how antidepressants work, it is a better strategy to directly study what happens when an animal takes an antidepressant.”
Which you could round off to “biologists don’t need to know about reproductive fitness”, in the sense that it is not the best use of their time.
ETA: I also have a model of you being less convinced by realism about rationality than others in the “MIRI crowd”; in particular, selection vs. control seems decidedly less “realist” than mesa-optimizers (which didn’t have to be “realist”, but was quite “realist” the way it was written, especially in its focus on search).
Huh? A lot of these points about evolution register to me as straightforwardly false. Understanding the theory of evolution moved us from “Why are there all these weird living things? Why do they exist? What is going on?” to “Each part of these organisms has been designed by a local hill-climbing process to maximise reproduction.” If I looked into it, I expect I’d find out that early medicine found it very helpful to understand how the system was built. This is like me handing you a massive amount of code that has a bunch of weird outputs and telling you to make it work better and more efficiently, and the same thing but where I tell you what company made the code, why they made it, and how they made it, and loads of examples of other pieces of code they made in this fashion.
If I knew how to operationalise it I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.
A lot of these points about evolution register to me as straightforwardly false.
I don’t know which particular points you mean. The only one that it sounds like you’re arguing against is
he theory of evolution has not had nearly the same impact on our ability to make big things [...] I struggle to name a way that evolution affects an everyday person
Were there others?
I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.
I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are “real”, in the sense that “real” is meant in the OP.) I don’t think that an improved mathematical understanding of what makes particular animals more fit has had that much of an impact on anything.
Separately, I also think the general insight of “each part of these organisms has been designed by a local hill-climbing process to maximise reproduction” would not have been very influential in either medicine or biology, had it not been accompanied by the math (and assuming no one ever developed the math).
On reflection, my original comment was quite unclear about this, I’ll add a note to it to clarify.
I do still stand by the thing that I meant in my original comment, which is that to the extent that you think rationality is like reproductive fitness (the claim made in the OP that Abram seems to agree with), where it is a very complicated mess of a function that we don’t hope to capture in a simple equation; I don’t think that improved understanding of that sort of thing has made much of an impact on our ability to do “big things” (as a proxy, things that affect normal people).
Within evolution, the claim would be that there has not been much impact from gaining an improved mathematical understanding of the reproductive fitness of some organism, or the “reproductive fitness” of some meme for memetic evolution.
I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are “real”, in the sense that “real” is meant in the OP.)
In contrast, I think the general insight of “each part of these organisms has been designed by a local hill-climbing process to maximise reproduction” would not have been very influential in either medicine or biology, had it not been accompanied by the math.
But surely you wouldn’t get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of ‘reproductive fitness’.
But surely you wouldn’t get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of ‘reproductive fitness’.
Here is my understanding of what Abram thinks:
Rationality is like “reproductive fitness”, in that it is hard to formalize and turn into hard math. Regardless of how much theoretical progress we make on understanding rationality, it is never going to turn into something that can make very precise, accurate predictions about real systems. Nonetheless, qualitative understanding of rationality, of the sort that can make rough predictions about real systems, is useful for AI safety.
Hopefully that makes it clear why I’m trying to imagine a counterfactual where the math was never developed.
It’s possible that I’m misunderstanding Abram and he actually thinks that we will be able to make precise, accurate predictions about real systems; but if that’s the case I think he in fact is “realist about rationality” and this post is in fact pointing at a crux between him and Richard (or him and me), though not as well as he would like.
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)
This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.
(I agree with 1, somewhat agree with 2, and don’t agree with 3).
It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?
My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).
I guess my position is something like this. I think it may be quite possible to make capabilities “blindly”—basically the processing-power heavy type of AI progress (applying enough tricks so you’re not literally recapitulating evolution, but you’re sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.
So I believe in some kind of knowledge to be had (ie, point #1).
Yeah, so, taking stock of the discussion again, it seems like:
There’s a thing-I-believe-which-is-kind-of-like-rationality-realism.
Points 1 and 2 together seem more in line with that thing than “rationality realism” as I understood it from the OP.
You already believe #1, and somewhat believe #2.
We are both pessimistic about #3, but I’m so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
We probably do have some disagreement about something like “how real is rationality?”—but I continue to strongly suspect it isn’t that cruxy.
(ETA: In my head I was replacing “evolution” with “reproductive fitness”; I don’t agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don’t know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)
I checked whether I thought the analogy was right with “reproductive fitness” and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
Sorry it resulted in a confusing mixed metaphor overall.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a “real” thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover “real” things that would then be important, but I don’t think that’s the claim.)
I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it’s all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.
Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn’t understand how organisms seeded on those planets would likely evolve.)
So—it seems to me—the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!
My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
I think we disagree primarily on 2 (and also how doomy the default case is, but let’s set that aside).
In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
I think that’s a crux between you and me. I’m no longer sure if it’s a crux between you and Richard. (ETA: I shouldn’t call this a crux, I wouldn’t change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the “unreal rationality” world to be similar to what Daniel mentions below:
I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
Yeah, I’m going to try to give a different explanation that doesn’t involve “realness”.
When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are “levers”, “gears”, “nails”, etc.
A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write “x + y”, I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don’t have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don’t have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don’t need to communicate all the caveats and intuitions that would accompany a leaky abstraction.
One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.
It’s fine if there’s some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness—if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)
If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can’t build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can’t generalize it to something more far-off. Some examples from these comment threads of what “inferences about directly related things” looks like:
current theories about why England had an industrial revolution when it did
[biology] has far more practical consequences (thinking of medicine)
understanding why overuse of antibiotics might weaken the effect of antibiotics [based on knowledge of evolution]
Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say “overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic”.
In contrast, for abstractions like “logic gates”, “assembly language”, “levers”, etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you’d be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.
So now I’d go back and state our crux as:
Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?
I would guess not. It sounds like you would guess yes.
I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about “directly relevant things”, which will probably let you say some interesting things about AI systems, just not very much. I’d expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that’s precise enough to build hierarchies with.
(I think I’d also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)
(You might wonder why I’m optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is “directly relevant” to existing ML systems, and so you don’t need to build hierarchies of abstraction—just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
Your few assumptions need to talk about the system you actually build. On the model I’m outlining, it’s hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.
ETA: I also have a model of you being less convinced by realism about rationality than others in the “MIRI crowd”; in particular, selection vs. control seems decidedly less “realist” than mesa-optimizers (which didn’t have to be “realist”, but was quite “realist” the way it was written, especially in its focus on search).
Just a quick reply to this part for now (but thanks for the extensive comment, I’ll try to get to it at some point).
It makes sense. My recent series on myopia also fits this theme. But I don’t get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of “agency” into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of “optimization”. (This applies to people at MIRI and also people outside MIRI.)
*The one person who has given me push-back is Scott.
My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
For what it’s worth, I think I disagree with this even when “non-real” means “as real as the theory of liberalism”. One example is companies—my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory.
(next paragraph is super political, but it’s important to my point)
I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn’t exactly mean ‘moral goodness’ but does imply the ability to support moral goodness—think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn’t complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon.
At any rate, I think it’s the case that the things that can be built off of these fake theories aren’t reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it’s possible to productively build off of them.
On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning
Agreed. I’d say they built things in the real world that were “one level above” their theories.
if that’s true, [...] then I’d think that spending time and effort developing the relevant theories was worth it
Agreed.
you seem to be pointing at something else
Agreed.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
Separately, I claim that:
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
(All of this with weak confidence.)
I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”
I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. [...] They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness.
Agreed.
The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised).
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.
In contrast, I struggle to name a way that evolution affects an everyday person
I’m not sure what exactly you mean, but examples that come to mind:
Crops and domestic animals that have been artificially selected for various qualities.
The medical community encouraging people to not use antibiotics unnecessarily.
[Inheritance but not selection] The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.
Crops and domestic animals that have been artificially selected for various qualities.
I feel fairly confident this was done before we understood evolution.
The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.
Also seems like a thing we knew before we understood evolution.
The medical community encouraging people to not use antibiotics unnecessarily.
That one seems plausible; though I’d want to know more about the history of how this came up. It also seems like the sort of thing that we’d have figured out even if we didn’t understand evolution, though it would have taken longer, and would have involved more deaths.
Going back to the AI case, my takeaway from this example is that understanding non-real things can still help if you need to get everything right the first time. And in fact, I do think that if you posit a discontinuity, such that we have to get everything right before that discontinuity, then the non-MIRI strategy looks worse because you can’t gather as much empirical evidence (though I still wouldn’t be convinced that the MIRI strategy is the right one).
Ah, I didn’t quite realise you meant to talk about “human understanding of the theory of evolution” rather than evolution itself. I still suspect that the theory of evolution is so fundamental to our understanding of biology, and our understanding of biology so useful to humanity, that if human understanding of evolution doesn’t contribute much to human welfare it’s just because most applications deal with pretty long time-scales.
(Also I don’t get why this discussion is treating evolution as ‘non-real’: stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)
(Also I don’t get why this discussion is treating evolution as ‘non-real’: stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)
Yeah, I agree, see my edits to the original comment and also my reply to Ben. Abram’s comment was talking about reproductive fitness the entire time and then suddenly switched to evolution at the end; I didn’t notice this and kept thinking of evolution as reproductive fitness in my head, and then wrote a comment based on that where I used the word evolution despite thinking about reproductive fitness and the general idea of “there is a local hill-climbing search on reproductive fitness” while ignoring the hard math.
How does evolutionary psychology help us during our everyday life? We already know that people like having sex and that they execute all these sorts of weird social behaviors. Why does providing the ultimate explanation for our behavior provide more than a satisfaction of our curiosity?
I like this review and think it was very helpful in understanding your (Abram’s) perspective, as well as highlighting some flaws in the original post, and ways that I’d been unclear in communicating my intuitions. In the rest of my comment I’ll try write a synthesis of my intentions for the original post with your comments; I’d be interested in the extent to which you agree or disagree.
We can distinguish between two ways to understand a concept X. For lack of better terminology, I’ll call them “understanding how X functions” and “understanding the nature of X”. I conflated these in the original post in a confusing way.
For example, I’d say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don’t yet exist, or even prove things about those components (e.g. there’s probably useful maths connecting graph theory with optimal nerve wiring), but it’s still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.
By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like “the number and quality of an organism’s offspring”. Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.
Momentum isn’t really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don’t know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature—but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).
I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.
However, I still think that there are different ways you might go about understanding the nature of intelligence, and that “something kind of like rationality realism” might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein’s thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I’m not sure. There’s an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don’t think that the extreme cases of fitness even make sense. So I would say that I am not a realist about “perfect fitness”, even though the concept of fitness itself seems fine.
So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like “if we succeed in finding a theory that tells us the nature of intelligence, it still won’t make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions.”
The reason I called it “rationality realism” not “intelligence realism” is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn’t. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there’s usually an assumption that “perfect rationality” exists. I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations like “perfect fitness” which “aren’t real” (I agree that this was quite unclear in the original post).
My proposed redefinition:
The “intelligence is intelligible” hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
The “realism about rationality” hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as “perfect rationality”, and “well-defined” with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we’ll ever discover).
So, yeah, one thing that’s going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)
But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.
The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.
So to a large extent I think my recent direction can be seen as continuing a theme already present—perhaps you might say I’m trying to properly learn the lesson of logical induction.
But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.
So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.
Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)
(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)
I’ll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is “to start with idealized rationality and try to drag it down to Earth rather than the other way around”. If the starting point is incoherent, then this approach doesn’t seem like it’ll go far—if AIXI isn’t useful to study, then probably AIXItl isn’t either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).
I appreciate that this isn’t an argument that I’ve made in a thorough or compelling way yet—I’m working on a post which does so.
If the starting point is incoherent, then this approach doesn’t seem like it’ll go far—if AIXI isn’t useful to study, then probably AIXItl isn’t either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).
Hm. I already think the starting point of Bayesian decision theory (which is even “further up” than AIXI in how I am thinking about it) is fairly useful.
In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as ‘utility’ (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn’t always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations.
Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc.
AIXI adds to all this the idea of quantifying Occam’s razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we’re going to disagree on.
As for AIXItl, I think it’s sort of taking the wrong approach to “dragging things down to earth”. Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.
Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I’m not sure he qualifies as a “miri person”)
I believe in some form of rationality realism: that is, that there’s a neat mathematical theory of ideal rationality that’s in practice relevant for how to build rational agents and be rational. I expect there to be a theory of bounded rationality about as mathematically specifiable and neat as electromagnetism (which after all in the real world requires a bunch of materials science to tell you about the permittivity of things).
If I didn’t believe the above, I’d be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my ‘worldview’ related to AI.
Searching for beliefs I hold for which ‘rationality realism’ is crucial by imagining what I’d conclude if I learned that ‘rationality irrealism’ was more right:
I’d be more interested in empirical understanding of deep learning and less interested in an understanding of learning theory.
I’d be less interested in probabilistic forecasting of things.
I’d want to find some higher-level thing that was more ‘real’/mathematically characterisable, and study that instead.
I’d be less optimistic about the prospects for an ‘ideal’ decision and reasoning theory.
My research depends on the belief that rational agents in the real world are likely to have some kind of ordered internal structure that is comprehensible to people. This belief is informed by rationality realism but distinct from it.
How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don’t see why the distinction should be so cruxy.
My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren’t “momentum vs reproductive fitness”, but rather, “momentum vs the bystander effect” (ie, physics vs social psychology). Reproductive fitness implies something that’s quite mathematizable, but with relatively “fake” models—e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it’s possible to get closer and closer.
I think that’s more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn’t ignore poly-time differences (ie, anything “closer to the ground” than logical induction) has to be hardware-dependent as well.
Meta/summary: I think we’re talking past each other, and hope that this comment clarifies things.
How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don’t see why the distinction should be so cruxy...
Reproductive fitness implies something that’s quite mathematizable, but with relatively “fake” models
I was thinking of the difference between the theory of electromagnetism vs the idea that there’s a reproductive fitness function, but that it’s very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with ‘fake’ models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I’m unsure which theory rationality will end up closer to.
Separately, I feel weird having people ask me about why things are ‘cruxy’ when I didn’t initially say that they were and without the context of an underlying disagreement that we’re hashing out. Like, either there’s some misunderstanding going on, or you’re asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.
I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn’t ignore poly-time differences (ie, anything “closer to the ground” than logical induction) has to be hardware-dependent as well.
I confess to being quite troubled by AIXI’s language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than “polynomial in some input”, which should be some input to a good theory of bounded rationality.
If I didn’t believe the above,
What alternative world are you imagining, though?
I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
I was thinking of the difference between the theory of electromagnetism vs the idea that there’s a reproductive fitness function, but that it’s very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with ‘fake’ models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I’m unsure which theory rationality will end up closer to.
[Spoiler-boxing the following response not because it’s a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn’t want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]
I’m having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very different to me. The “hard to determine what it is” from the first seems to lead directly to the “fake inputs” from the second.
So possibly you’re gesturing at a level of realness which is “how real fitness functions would be if there were not a theory of population genetics”? But I’m not sure exactly what to imagine there, so could you give a different example (maybe a few) of something which is that level of real?
Separately, I feel weird having people ask me about why things are ‘cruxy’ when I didn’t initially say that they were and without the context of an underlying disagreement that we’re hashing out. Like, either there’s some misunderstanding going on, or you’re asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.
Ah, well. I interpreted this earlier statement from you as a statement of cruxiness:
If I didn’t believe the above, I’d be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my ‘worldview’ related to AI.
And furthermore the list following this:
Searching for beliefs I hold for which ‘rationality realism’ is crucial by imagining what I’d conclude if I learned that ‘rationality irrealism’ was more right:
So, yeah, I’m asking you about something which you haven’t claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.
I confess to being quite troubled by AIXI’s language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than “polynomial in some input”, which should be some input to a good theory of bounded rationality.
Ah, so this points to a real and large disagreement between us about how subjective a theory of rationality should be (which may be somewhat independent of just how real rationality is, but is related).
I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
Ok. Taking this as the rationality irrealism position, I would disagree with it, and also agree that it would make a big difference for the things you said rationality-irrealism would make a big difference for.
So I now think we have a big disagreement around point “a” (just how real rationality is), but maybe not so much around “b” (what the consequences are for the various bullet points you listed).
So, yeah, I’m asking you about something which you haven’t claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.
For what it’s worth, from my perspective, two months ago I said I fell into a certain pattern of thinking, then raemon put me in the position of saying what that was a crux for, then I was asked to elaborate about why a specific facet of the distinction was cruxy, and also the pattern of thinking morphed into something more analogous to a proposition. So I’m happy to elaborate on consequences of ‘rationality realism’ in my mind (such as they are—the term seems vague enough that I’m a ‘rationality realism’ anti-realist and so don’t want to lean too heavily on the concept) in order to further a discussion, but in the context of an exchange that was initially framed as a debate I’d like to be clear about what commitments I am and am not making.
Anyway, glad to clarify that we have a big disagreement about how ‘real’ a theory of rationality should be, which probably resolves to a medium-sized disagreement about how ‘real’ rationality and/or its best theory actually is.
To answer the easy part of this question/remark, I don’t work at MIRI and don’t research agent foundations, so I think I shouldn’t count as a “MIRI person”, despite having good friends at MIRI and having interned there.
(On a related note, it seems to me that the terminology “MIRI person”/”MIRI cluster” obscures intellectual positions and highlights social connections, which makes me wish that it was less prominent.)
I guess the main thing I want is an actual tally on “how many people definitively found this post to represent their crux”, vs “how many people think that this represented other people’s cruxes”
If I believed realism about rationality, I’d be closer to buying what I see as the MIRI story for impact. It’s hard to say whether I’d actually change my mind without knowing the details of what exactly I’m updating to.
I think that ricraz claims that it’s impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the “momentum vs. fitness” comparison doesn’t make sense to me. Specifically, a concept doesn’t have to be crisply well-defined in order to use it in mathematical models. Even momentum, which is truly one of the “cripser” concepts in science, is no longer well-defined when spacetime is not asymptotically flat (which it isn’t). Much less so are concepts such as “atom”, “fitness” or “demand”. Nevertheless, physicists, biologist and economists continue to successfully construct and apply mathematical models grounded in such fuzzy concepts. Although in some sense I also endorse the “strawman” that rationality is more like momentum than like fitness (at least some aspects of rationality).
Although in some sense I also endorse the “strawman” that rationality is more like momentum than like fitness (at least some aspects of rationality).
How so?
I think that ricraz claims that it’s impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the “momentum vs. fitness” comparison doesn’t make sense to me.
Well, it’s not entirely clear. First there is the “realism” claim, which might even be taken in contrast to mathematical abstraction; EG, “is IQ real, or is it just a mathematical abstraction”? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where “accurate” means, at least in part, helpfulness in making real predictions).
So the idea seems to be that there’s a spectrum with physics at one extreme end. I’m not quite sure what goes at the other extreme end. Here’s one possibility:
Physics
Chemistry
Biology
Psychology
Social Sciences
Humanities
A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, “realness” vs “mathematical modelability”. Well, it’s not clear exactly what that second axis should be.
Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect “reproductive fitness” levels rather than “momentum” levels.
Hmm, actually, I guess there’s a tricky interpretational issue here, which is what it means to model agency exactly.
On the one hand, I fully believe in Eliezer’s idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.
I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.
Yeah, I should have been much more careful before throwing around words like “real”. See the long comment I just posted for more clarification, and in particular this paragraph:
I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations which “aren’t real” (I agree that this was quite unclear in the original post).
It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.
What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of qualitative properties. Whether that’s closer to “momentum” or “fitness”, I’m not sure the question is even meaningful.
I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn’t tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in NC then you know that it’s possible to gain a lot from parallelization. If the problem is in NP then at least you can test solutions, et cetera.)
And also, abstract theory of alignment should be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that the assumptions of the theory hold in the real system.)
It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.
That seems manifestly false. You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop. You can look at the recursive Fibonacci algorithm and figure out what it would do without ever running it. So there is a clear distinction between analyzing an algorithm and executing it. If anything, one would know more about the agent by using the techniques from analysis of algorithms than the agent would ever know about themselves.
Of course you can predict some properties of what an agent will do. In particular, I hope that we will eventually have AGI algorithms that satisfy provable safety guarantees. But, you can’t make exact predictions. In fact, there probably is a mathematical law that limits how accurate predictions you can get.
An optimization algorithm is, by definition, something that transforms computational resources into utility. So, if your prediction is so close to the real output that it has similar utility, then it means the way you produced this prediction involved the same product of “optimization power per unit of resources” by “amount of resources invested” (roughly speaking, I don’t claim to already know the correct formalism for this). So you would need to either (i) run a similar algorithm with similar resources or (ii) run a dumber algorithm but with more resources or (iii) use less resources but an even smarter algorithm.
So, if you want to accurately predict the output of a powerful optimization algorithm, your prediction algorithm would usually have to be either a powerful optimization algorithm in itself (cases i and iii) or prohibitively costly to run (case ii). The exception is cases when the optimization problem is easy, so a dumb algorithm can solve it without much resources (or a human can figure out the answer by emself).
It seems to me like my position, and the MIRI-cluster position, is (1) closer to “rationality is like fitness” than “rationality is like momentum”
Eliezer is a fan of law thinking, right? Doesn’t the law thinker position imply that intelligence can be characterized in a “lawful” way like momentum?
Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we’re confused), but I’m skeptical of MIRI because they seem more confused than average to me.
Doesn’t the law thinker position imply that intelligence can be characterized in a “lawful” way like momentum?
It depends on what you mean by “lawful”. Right now, the word “lawful” in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it’s not clear to me why “law thinking” is relevant in the first place—it seems as though it simply muddies the discussion by introducing additional concepts.
In my experience, if there are several concepts that seem similar, understanding how they relate to one another usually helps with clarity rather than hurting.
That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.
In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.
I didn’t like this post. At the time, I didn’t engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn’t actually engage with the idea very much. So it seems like a good idea to say something now.
The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don’t think it’s my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it’s a straw-man of the view it’s trying to point at.
The main problem is the word “realism”. It isn’t clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.
I agree that there’s something kind of like rationality realism. I just don’t think this post successfully points at it.
Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.
So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it’s more like fitness.
It seems to me like my position, and the MIRI-cluster position, is (1) closer to “rationality is like fitness” than “rationality is like momentum”, and (2) doesn’t depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy—evolutionary biologists still see fitness as a very important subject, and don’t seem to care that much about exactly how real the abstraction is.)
To the extent that this post has made a lot of people think that rationality realism is an important crux, it’s quite plausible to me that it’s made the discussion worse.
To expand more on (1) -- since it seems a lot of people found its negation plausible—it seems like if there’s an analogue for the theory of evolution, which uses relatively unreal concepts like “fitness” to help us understand rational agency, we’d like to know about it. In this view, MIRI-cluster is essentially saying “biologists should want to invent evolution. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
ETA: The original version of this comment conflated “evolution” and “reproductive fitness”, I’ve updated it now (see also my reply to Ben Pace’s comment).
MIRI in general and you in particular seem unusually (to me) confident that:
1. We can learn more than we already know about rationality of “ideal” agents (or perhaps arbitrary agents?).
2. This understanding will allow us to build AI systems that we understand better than the ones we build today.
3. We will be able to do this in time for it to affect real AI systems. (This could be either because it is unusually tractable and can be solved very quickly, or because timelines are very long.)
This is primarily based on what research you and MIRI do, some of MIRI’s strategy writing, writing like the Rocket Alignment problem and law thinking, and an assumption that you are choosing to do this research because you think it is an effective way to reduce AI risk (given your skills).
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)
My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy). I’d be interested in an argument for the three points listed above without realism about rationality (I agree with 1, somewhat agree with 2, and don’t agree with 3).
If you don’t have realism about rationality, then I basically agree with this sentence, though I’d rephrase it:
(ETA: In my head I was replacing “evolution” with “reproductive fitness”; I don’t agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don’t know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)
To my knowledge,
the theory of evolution(ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way thatevolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a “real” thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover “real” things that would then be important, but I don’t think that’s the claim.) My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.So I’d rephrase the sentence as: (ETA: changed the sentence a bit to talk about fitness instead of evolution)
Which you could round off to “biologists don’t need to know about reproductive fitness”, in the sense that it is not the best use of their time.
ETA: I also have a model of you being less convinced by realism about rationality than others in the “MIRI crowd”; in particular, selection vs. control seems decidedly less “realist” than mesa-optimizers (which didn’t have to be “realist”, but was quite “realist” the way it was written, especially in its focus on search).
Huh? A lot of these points about evolution register to me as straightforwardly false. Understanding the theory of evolution moved us from “Why are there all these weird living things? Why do they exist? What is going on?” to “Each part of these organisms has been designed by a local hill-climbing process to maximise reproduction.” If I looked into it, I expect I’d find out that early medicine found it very helpful to understand how the system was built. This is like me handing you a massive amount of code that has a bunch of weird outputs and telling you to make it work better and more efficiently, and the same thing but where I tell you what company made the code, why they made it, and how they made it, and loads of examples of other pieces of code they made in this fashion.
If I knew how to operationalise it I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.
I don’t know which particular points you mean. The only one that it sounds like you’re arguing against is
Were there others?
I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are “real”, in the sense that “real” is meant in the OP.) I don’t think that an improved mathematical understanding of what makes particular animals more fit has had that much of an impact on anything.
Separately, I also think the general insight of “each part of these organisms has been designed by a local hill-climbing process to maximise reproduction” would not have been very influential in either medicine or biology, had it not been accompanied by the math (and assuming no one ever developed the math).
On reflection, my original comment was quite unclear about this, I’ll add a note to it to clarify.
I do still stand by the thing that I meant in my original comment, which is that to the extent that you think rationality is like reproductive fitness (the claim made in the OP that Abram seems to agree with), where it is a very complicated mess of a function that we don’t hope to capture in a simple equation; I don’t think that improved understanding of that sort of thing has made much of an impact on our ability to do “big things” (as a proxy, things that affect normal people).
Within evolution, the claim would be that there has not been much impact from gaining an improved mathematical understanding of the reproductive fitness of some organism, or the “reproductive fitness” of some meme for memetic evolution.
But surely you wouldn’t get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of ‘reproductive fitness’.
Here is my understanding of what Abram thinks:
Rationality is like “reproductive fitness”, in that it is hard to formalize and turn into hard math. Regardless of how much theoretical progress we make on understanding rationality, it is never going to turn into something that can make very precise, accurate predictions about real systems. Nonetheless, qualitative understanding of rationality, of the sort that can make rough predictions about real systems, is useful for AI safety.
Hopefully that makes it clear why I’m trying to imagine a counterfactual where the math was never developed.
It’s possible that I’m misunderstanding Abram and he actually thinks that we will be able to make precise, accurate predictions about real systems; but if that’s the case I think he in fact is “realist about rationality” and this post is in fact pointing at a crux between him and Richard (or him and me), though not as well as he would like.
This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.
It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?
I guess my position is something like this. I think it may be quite possible to make capabilities “blindly”—basically the processing-power heavy type of AI progress (applying enough tricks so you’re not literally recapitulating evolution, but you’re sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.
So I believe in some kind of knowledge to be had (ie, point #1).
Yeah, so, taking stock of the discussion again, it seems like:
There’s a thing-I-believe-which-is-kind-of-like-rationality-realism.
Points 1 and 2 together seem more in line with that thing than “rationality realism” as I understood it from the OP.
You already believe #1, and somewhat believe #2.
We are both pessimistic about #3, but I’m so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
We probably do have some disagreement about something like “how real is rationality?”—but I continue to strongly suspect it isn’t that cruxy.
I checked whether I thought the analogy was right with “reproductive fitness” and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
Sorry it resulted in a confusing mixed metaphor overall.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it’s all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.
Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn’t understand how organisms seeded on those planets would likely evolve.)
So—it seems to me—the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
I think we disagree primarily on 2 (and also how doomy the default case is, but let’s set that aside).
I think that’s a crux between you and me. I’m no longer sure if it’s a crux between you and Richard. (ETA: I shouldn’t call this a crux, I wouldn’t change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)
Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the “unreal rationality” world to be similar to what Daniel mentions below:
Yeah, I’m going to try to give a different explanation that doesn’t involve “realness”.
When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are “levers”, “gears”, “nails”, etc.
A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write “x + y”, I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don’t have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don’t have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don’t need to communicate all the caveats and intuitions that would accompany a leaky abstraction.
One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.
It’s fine if there’s some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness—if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)
If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can’t build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can’t generalize it to something more far-off. Some examples from these comment threads of what “inferences about directly related things” looks like:
Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say “overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic”.
In contrast, for abstractions like “logic gates”, “assembly language”, “levers”, etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you’d be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.
So now I’d go back and state our crux as:
I would guess not. It sounds like you would guess yes.
I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about “directly relevant things”, which will probably let you say some interesting things about AI systems, just not very much. I’d expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that’s precise enough to build hierarchies with.
(I think I’d also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)
(You might wonder why I’m optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is “directly relevant” to existing ML systems, and so you don’t need to build hierarchies of abstraction—just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)
Your few assumptions need to talk about the system you actually build. On the model I’m outlining, it’s hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.
I generally like the re-framing here, and agree with the proposed crux.
I may try to reply more at the object level later.
Abram, did you reply to that crux somewhere?
Just a quick reply to this part for now (but thanks for the extensive comment, I’ll try to get to it at some point).
It makes sense. My recent series on myopia also fits this theme. But I don’t get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of “agency” into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of “optimization”. (This applies to people at MIRI and also people outside MIRI.)
*The one person who has given me push-back is Scott.
For what it’s worth, I think I disagree with this even when “non-real” means “as real as the theory of liberalism”. One example is companies—my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory.
(next paragraph is super political, but it’s important to my point)
I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn’t exactly mean ‘moral goodness’ but does imply the ability to support moral goodness—think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn’t complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon.
At any rate, I think it’s the case that the things that can be built off of these fake theories aren’t reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it’s possible to productively build off of them.
On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
Agreed. I’d say they built things in the real world that were “one level above” their theories.
Agreed.
Agreed.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
Separately, I claim that:
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
(All of this with weak confidence.)
I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
OK, I think I understand you now.
I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
Agreed.
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.
I’m not sure what exactly you mean, but examples that come to mind:
Crops and domestic animals that have been artificially selected for various qualities.
The medical community encouraging people to not use antibiotics unnecessarily.
[Inheritance but not selection] The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.
I feel fairly confident this was done before we understood evolution.
Also seems like a thing we knew before we understood evolution.
That one seems plausible; though I’d want to know more about the history of how this came up. It also seems like the sort of thing that we’d have figured out even if we didn’t understand evolution, though it would have taken longer, and would have involved more deaths.
Going back to the AI case, my takeaway from this example is that understanding non-real things can still help if you need to get everything right the first time. And in fact, I do think that if you posit a discontinuity, such that we have to get everything right before that discontinuity, then the non-MIRI strategy looks worse because you can’t gather as much empirical evidence (though I still wouldn’t be convinced that the MIRI strategy is the right one).
Ah, I didn’t quite realise you meant to talk about “human understanding of the theory of evolution” rather than evolution itself. I still suspect that the theory of evolution is so fundamental to our understanding of biology, and our understanding of biology so useful to humanity, that if human understanding of evolution doesn’t contribute much to human welfare it’s just because most applications deal with pretty long time-scales.
(Also I don’t get why this discussion is treating evolution as ‘non-real’: stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)
Yeah, I agree, see my edits to the original comment and also my reply to Ben. Abram’s comment was talking about reproductive fitness the entire time and then suddenly switched to evolution at the end; I didn’t notice this and kept thinking of evolution as reproductive fitness in my head, and then wrote a comment based on that where I used the word evolution despite thinking about reproductive fitness and the general idea of “there is a local hill-climbing search on reproductive fitness” while ignoring the hard math.
The most obvious thing is understanding why overuse of antibiotics might weaken the effect of antibiotics.
See response to Daniel below; I find this one a little compelling (but not that much).
Evolutionary psychology?
How does evolutionary psychology help us during our everyday life? We already know that people like having sex and that they execute all these sorts of weird social behaviors. Why does providing the ultimate explanation for our behavior provide more than a satisfaction of our curiosity?
+1, it seems like some people with direct knowledge of evolutionary psychology get something out of it, but not everyone else.
Sorry, how is this not saying “people who don’t know evo-psych don’t get anything out of knowing evo-psych”?
I like this review and think it was very helpful in understanding your (Abram’s) perspective, as well as highlighting some flaws in the original post, and ways that I’d been unclear in communicating my intuitions. In the rest of my comment I’ll try write a synthesis of my intentions for the original post with your comments; I’d be interested in the extent to which you agree or disagree.
We can distinguish between two ways to understand a concept X. For lack of better terminology, I’ll call them “understanding how X functions” and “understanding the nature of X”. I conflated these in the original post in a confusing way.
For example, I’d say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don’t yet exist, or even prove things about those components (e.g. there’s probably useful maths connecting graph theory with optimal nerve wiring), but it’s still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.
By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like “the number and quality of an organism’s offspring”. Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.
Momentum isn’t really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don’t know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature—but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).
I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.
However, I still think that there are different ways you might go about understanding the nature of intelligence, and that “something kind of like rationality realism” might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein’s thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I’m not sure. There’s an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don’t think that the extreme cases of fitness even make sense. So I would say that I am not a realist about “perfect fitness”, even though the concept of fitness itself seems fine.
So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like “if we succeed in finding a theory that tells us the nature of intelligence, it still won’t make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions.”
The reason I called it “rationality realism” not “intelligence realism” is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn’t. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there’s usually an assumption that “perfect rationality” exists. I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations like “perfect fitness” which “aren’t real” (I agree that this was quite unclear in the original post).
My proposed redefinition:
The “intelligence is intelligible” hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
The “realism about rationality” hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as “perfect rationality”, and “well-defined” with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we’ll ever discover).
So, yeah, one thing that’s going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)
But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.
The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.
So to a large extent I think my recent direction can be seen as continuing a theme already present—perhaps you might say I’m trying to properly learn the lesson of logical induction.
But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.
So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.
Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)
(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)
I’ll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is “to start with idealized rationality and try to drag it down to Earth rather than the other way around”. If the starting point is incoherent, then this approach doesn’t seem like it’ll go far—if AIXI isn’t useful to study, then probably AIXItl isn’t either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).
I appreciate that this isn’t an argument that I’ve made in a thorough or compelling way yet—I’m working on a post which does so.
Hm. I already think the starting point of Bayesian decision theory (which is even “further up” than AIXI in how I am thinking about it) is fairly useful.
In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as ‘utility’ (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn’t always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations.
Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc.
AIXI adds to all this the idea of quantifying Occam’s razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we’re going to disagree on.
As for AIXItl, I think it’s sort of taking the wrong approach to “dragging things down to earth”. Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.
Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I’m not sure he qualifies as a “miri person”)
I believe in some form of rationality realism: that is, that there’s a neat mathematical theory of ideal rationality that’s in practice relevant for how to build rational agents and be rational. I expect there to be a theory of bounded rationality about as mathematically specifiable and neat as electromagnetism (which after all in the real world requires a bunch of materials science to tell you about the permittivity of things).
If I didn’t believe the above, I’d be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my ‘worldview’ related to AI.
Searching for beliefs I hold for which ‘rationality realism’ is crucial by imagining what I’d conclude if I learned that ‘rationality irrealism’ was more right:
I’d be more interested in empirical understanding of deep learning and less interested in an understanding of learning theory.
I’d be less interested in probabilistic forecasting of things.
I’d want to find some higher-level thing that was more ‘real’/mathematically characterisable, and study that instead.
I’d be less optimistic about the prospects for an ‘ideal’ decision and reasoning theory.
My research depends on the belief that rational agents in the real world are likely to have some kind of ordered internal structure that is comprehensible to people. This belief is informed by rationality realism but distinct from it.
How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don’t see why the distinction should be so cruxy.
My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren’t “momentum vs reproductive fitness”, but rather, “momentum vs the bystander effect” (ie, physics vs social psychology). Reproductive fitness implies something that’s quite mathematizable, but with relatively “fake” models—e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it’s possible to get closer and closer.
I think that’s more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn’t ignore poly-time differences (ie, anything “closer to the ground” than logical induction) has to be hardware-dependent as well.
What alternative world are you imagining, though?
Meta/summary: I think we’re talking past each other, and hope that this comment clarifies things.
I was thinking of the difference between the theory of electromagnetism vs the idea that there’s a reproductive fitness function, but that it’s very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with ‘fake’ models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I’m unsure which theory rationality will end up closer to.
Separately, I feel weird having people ask me about why things are ‘cruxy’ when I didn’t initially say that they were and without the context of an underlying disagreement that we’re hashing out. Like, either there’s some misunderstanding going on, or you’re asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.
I confess to being quite troubled by AIXI’s language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than “polynomial in some input”, which should be some input to a good theory of bounded rationality.
I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
[Spoiler-boxing the following response not because it’s a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn’t want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]
I’m having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very different to me. The “hard to determine what it is” from the first seems to lead directly to the “fake inputs” from the second.
So possibly you’re gesturing at a level of realness which is “how real fitness functions would be if there were not a theory of population genetics”? But I’m not sure exactly what to imagine there, so could you give a different example (maybe a few) of something which is that level of real?
Ah, well. I interpreted this earlier statement from you as a statement of cruxiness:
And furthermore the list following this:
So, yeah, I’m asking you about something which you haven’t claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.
Ah, so this points to a real and large disagreement between us about how subjective a theory of rationality should be (which may be somewhat independent of just how real rationality is, but is related).
Ok. Taking this as the rationality irrealism position, I would disagree with it, and also agree that it would make a big difference for the things you said rationality-irrealism would make a big difference for.
So I now think we have a big disagreement around point “a” (just how real rationality is), but maybe not so much around “b” (what the consequences are for the various bullet points you listed).
For what it’s worth, from my perspective, two months ago I said I fell into a certain pattern of thinking, then raemon put me in the position of saying what that was a crux for, then I was asked to elaborate about why a specific facet of the distinction was cruxy, and also the pattern of thinking morphed into something more analogous to a proposition. So I’m happy to elaborate on consequences of ‘rationality realism’ in my mind (such as they are—the term seems vague enough that I’m a ‘rationality realism’ anti-realist and so don’t want to lean too heavily on the concept) in order to further a discussion, but in the context of an exchange that was initially framed as a debate I’d like to be clear about what commitments I am and am not making.
Anyway, glad to clarify that we have a big disagreement about how ‘real’ a theory of rationality should be, which probably resolves to a medium-sized disagreement about how ‘real’ rationality and/or its best theory actually is.
This is such an interesting use of a spoiler tags. I might try it myself sometime.
To answer the easy part of this question/remark, I don’t work at MIRI and don’t research agent foundations, so I think I shouldn’t count as a “MIRI person”, despite having good friends at MIRI and having interned there.
(On a related note, it seems to me that the terminology “MIRI person”/”MIRI cluster” obscures intellectual positions and highlights social connections, which makes me wish that it was less prominent.)
I guess the main thing I want is an actual tally on “how many people definitively found this post to represent their crux”, vs “how many people think that this represented other people’s cruxes”
If I believed realism about rationality, I’d be closer to buying what I see as the MIRI story for impact. It’s hard to say whether I’d actually change my mind without knowing the details of what exactly I’m updating to.
I think that ricraz claims that it’s impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the “momentum vs. fitness” comparison doesn’t make sense to me. Specifically, a concept doesn’t have to be crisply well-defined in order to use it in mathematical models. Even momentum, which is truly one of the “cripser” concepts in science, is no longer well-defined when spacetime is not asymptotically flat (which it isn’t). Much less so are concepts such as “atom”, “fitness” or “demand”. Nevertheless, physicists, biologist and economists continue to successfully construct and apply mathematical models grounded in such fuzzy concepts. Although in some sense I also endorse the “strawman” that rationality is more like momentum than like fitness (at least some aspects of rationality).
How so?
Well, it’s not entirely clear. First there is the “realism” claim, which might even be taken in contrast to mathematical abstraction; EG, “is IQ real, or is it just a mathematical abstraction”? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where “accurate” means, at least in part, helpfulness in making real predictions).
So the idea seems to be that there’s a spectrum with physics at one extreme end. I’m not quite sure what goes at the other extreme end. Here’s one possibility:
Physics
Chemistry
Biology
Psychology
Social Sciences
Humanities
A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, “realness” vs “mathematical modelability”. Well, it’s not clear exactly what that second axis should be.
Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect “reproductive fitness” levels rather than “momentum” levels.
Hmm, actually, I guess there’s a tricky interpretational issue here, which is what it means to model agency exactly.
On the one hand, I fully believe in Eliezer’s idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.
I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.
Yeah, I should have been much more careful before throwing around words like “real”. See the long comment I just posted for more clarification, and in particular this paragraph:
It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.
What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of qualitative properties. Whether that’s closer to “momentum” or “fitness”, I’m not sure the question is even meaningful.
I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn’t tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in NC then you know that it’s possible to gain a lot from parallelization. If the problem is in NP then at least you can test solutions, et cetera.)
And also, abstract theory of alignment should be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that the assumptions of the theory hold in the real system.)
That seems manifestly false. You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop. You can look at the recursive Fibonacci algorithm and figure out what it would do without ever running it. So there is a clear distinction between analyzing an algorithm and executing it. If anything, one would know more about the agent by using the techniques from analysis of algorithms than the agent would ever know about themselves.
In special cases, not in the general case.
Of course you can predict some properties of what an agent will do. In particular, I hope that we will eventually have AGI algorithms that satisfy provable safety guarantees. But, you can’t make exact predictions. In fact, there probably is a mathematical law that limits how accurate predictions you can get.
An optimization algorithm is, by definition, something that transforms computational resources into utility. So, if your prediction is so close to the real output that it has similar utility, then it means the way you produced this prediction involved the same product of “optimization power per unit of resources” by “amount of resources invested” (roughly speaking, I don’t claim to already know the correct formalism for this). So you would need to either (i) run a similar algorithm with similar resources or (ii) run a dumber algorithm but with more resources or (iii) use less resources but an even smarter algorithm.
So, if you want to accurately predict the output of a powerful optimization algorithm, your prediction algorithm would usually have to be either a powerful optimization algorithm in itself (cases i and iii) or prohibitively costly to run (case ii). The exception is cases when the optimization problem is easy, so a dumb algorithm can solve it without much resources (or a human can figure out the answer by emself).
Eliezer is a fan of law thinking, right? Doesn’t the law thinker position imply that intelligence can be characterized in a “lawful” way like momentum?
As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we’re confused), but I’m skeptical of MIRI because they seem more confused than average to me.
It depends on what you mean by “lawful”. Right now, the word “lawful” in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it’s not clear to me why “law thinking” is relevant in the first place—it seems as though it simply muddies the discussion by introducing additional concepts.
In my experience, if there are several concepts that seem similar, understanding how they relate to one another usually helps with clarity rather than hurting.
That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.
In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.
“Fundamentalism” would be a better term for the cluster of problems—dogmatism, literalism and epistemic over-confidence.