My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
For what it’s worth, I think I disagree with this even when “non-real” means “as real as the theory of liberalism”. One example is companies—my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory.
(next paragraph is super political, but it’s important to my point)
I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn’t exactly mean ‘moral goodness’ but does imply the ability to support moral goodness—think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn’t complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon.
At any rate, I think it’s the case that the things that can be built off of these fake theories aren’t reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it’s possible to productively build off of them.
On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning
Agreed. I’d say they built things in the real world that were “one level above” their theories.
if that’s true, [...] then I’d think that spending time and effort developing the relevant theories was worth it
Agreed.
you seem to be pointing at something else
Agreed.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
Separately, I claim that:
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
(All of this with weak confidence.)
I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”
I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. [...] They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness.
Agreed.
The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised).
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.
For what it’s worth, I think I disagree with this even when “non-real” means “as real as the theory of liberalism”. One example is companies—my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory.
(next paragraph is super political, but it’s important to my point)
I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn’t exactly mean ‘moral goodness’ but does imply the ability to support moral goodness—think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn’t complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon.
At any rate, I think it’s the case that the things that can be built off of these fake theories aren’t reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it’s possible to productively build off of them.
On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
Agreed. I’d say they built things in the real world that were “one level above” their theories.
Agreed.
Agreed.
Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
Separately, I claim that:
“real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
(All of this with weak confidence.)
I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
OK, I think I understand you now.
I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
Yeah, I think this is the sense in which realism about rationality is an important disagreement.
Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
Agreed.
I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.