On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning
Agreed. I’d say they built things in the real world that were “one level above” their theories.
if that’s true, [...] then I’d think that spending time and effort developing the relevant theories was worth it
Agreed.
you seem to be pointing at something else
Agreed.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
Separately, I claim that:
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
(All of this with weak confidence.)
I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”
I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. [...] They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness.
Agreed.
The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised).
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.
On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
Agreed. I’d say they built things in the real world that were “one level above” their theories.
Agreed.
Agreed.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
Separately, I claim that:
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
(All of this with weak confidence.)
I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
OK, I think I understand you now.
I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
Agreed.
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.