Certain epistemic approaches to arrive at object-level knowledge consistently look like a good source of grounding in reality, especially to people who are trying to be careful about epistemics, and yet such approaches’ grounding in reality is consistently illusory.
Specific examples mentioned in the post are “the outside view”, “reference class forecasting”, “maximum entropy”, “the median of what I remember credible people saying”, and, most importantly for the object-level but least importantly for the Core Thing, “Drake-equation-style approaches that cleanly represent the unknown of interest as a deterministic function of simpler-seeming variables”.
Specific non-examples are concrete experimental observations (falling objects, celestial motion). These have grounding in reality, but they don’t tend to feel like they’re “objective” in the same way, like a nexus that my beliefs are epistemically obligated to move toward—they just feel like a part of my map that isn’t confused. (If experiments do start to feel “objective”, one is then liable to mistake empirical frequencies for probabilities.)
The illusion of grounding in reality doesn’t have to be absolute (i.e. “this method reliably arrives at optimal beliefs”) to be absolutely corrupting (e.g. “this prior, which is not relative to anything in particular, is uniquely privileged and central, even if the optimal beliefs are somewhere nearby-but-different”).
The “trick that never works”, in general form, is to go looking in epistemology-space for some grounding in objective reality, which will systematically tend to lead you into these illusory traps.
Instead of trying to repress your subjective ignorance by shackling it to something objective, you should:
sublimate your subjective ignorance into quantitative probability measures,
use those to make predictions and design experiments, and finally
freely and openly absorb observations into your subjective mind and make subjective updates.
Eliezer doesn’t seem to be saying the following [edit: at least not in my reading of this specific post], but I would like to add:
Even just trying to make your updates objective (e.g. by using a computer to perform an exact Bayesian update) tends to go subtly wrong, because it can encourage you to replace your actual map with your map of your map, which is predictably less informative. Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.
Calibration training is useful because your actual map is also predictably systematically bad at updating by default, and calibration training makes it better at doing this. Teaching the low-description-length principles of probability to your actual map-updating system is much more feasible (or at least more cost-effective) than emitting your actual map into a computationally realizable statistical model.
Techniques that give the illusion of objectivity are usually not useless. But to use them effectively, you have to see through the illusion of objectivity, and treat their outputs as observations of what those techniques output, rather than as glimpses at the light of objective reasonableness.
In the particular example of forecasting AGI with biological anchors, Eliezer does this when he predicts (correctly, at least in the fictional dialogue) that the technique (perhaps especially when operated by people who are trying to be careful and objective) will output a median 30 years from the present.
It’s only because Eliezer can predict the outcome, and finds it almost uncorrelated in his map from AGI’s actual arrival, that he dismisses the estimate as useless.
This particular example as a vehicle for the Core Thing (if I’m right about what that is) has the advantage that the illusion of objectivity is especially illusory (at least from Eliezer’s perspective), but the disadvantage that one can almost read Eliezer as condemning ever using Drake-equation-style approaches, or reference-class forecasting, or the principle of maximum entropy. But I think the general point is about how to undistortedly view the role of these kinds of things in one’s epistemic journey, which in most cases doesn’t actually exclude using them.
Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.
Sounds to me like one of the things Eliezer is pointing at in Hero Licensing:
Look, thinking things like that is just not how the inside of my head is organized. There’s just the book I have in my head and the question of whether I can translate that image into reality. My mental world is about the book, not about me.
You do want to train your brain, and you want to understand your strengths and weaknesses. But dwelling on your biases at the expense of the object level isn’t actually usually the best way to give your brain training data and tweak its performance.
I think there’s a lesson here that, e.g., Scott Alexander hadn’t fully internalized as of his 2017 Inadequate Equilibria review. There’s a temptation to “go meta” and find some cleaner, more principled, more objective-sounding algorithm to follow than just “learn lots and lots of object-level facts so you can keep refining your model, learn some facts about your brain too so you can know how much to trust it in different domains, and just keep doing that”.
But in fact there’s no a priori reason to expect there to be a shortcut that lets you skip the messy unprincipled your-own-perspective-privileging Bayesian Updating thing. Going meta is just a tool in the toolbox, and it’s risky to privilege it on ‘sounds more objective/principled’ grounds when there’s neither a theoretical argument nor an empirical-track-record argument for expecting that approach to actually work.
Teaching the low-description-length principles of probability to your actual map-updating system is much more feasible (or at least more cost-effective) than emitting your actual map into a computationally realizable statistical model.
I think this is a good distillation of Eliezer’s view (though I know you’re just espousing your own view here). And of mine, for that matter. Quoting Hero Licensing again:
STRANGER: I believe the technical term for the methodology is “pulling numbers out of your ass.” It’s important to practice calibrating your ass numbers on cases where you’ll learn the correct answer shortly afterward. It’s also important that you learn the limits of ass numbers, and don’t make unrealistic demands on them by assigning multiple ass numbers to complicated conditional events.
ELIEZER: I’d say I reached the estimate… by thinking about the object-level problem? By using my domain knowledge? By having already thought a lot about the problem so as to load many relevant aspects into my mind, then consulting my mind’s native-format probability judgment—with some prior practice at betting having already taught me a little about how to translate those native representations of uncertainty into 9:1 betting odds.
One framing I use is that there are two basic perspectives on rationality:
Prosthesis: Human brains are naturally bad at rationality, so we can identify external tools (and cognitive tech that’s too simple and straightforward for us to misuse) and try to offload as much of our reasoning as possible onto those tools, so as to not have to put weight down (beyond the bare minimum necessary) on our own fallible judgment.
Strength training: There’s a sense in which every human has a small AGI (or a bunch of AGIs) inside their brain. If we didn’t have access to such capabilities, we wouldn’t be able to do complicated ‘planning and steering of the world into future states’ at all.
It’s true that humans often behave ‘irrationally’, in the sense that we output actions based on simpler algorithms (e.g., reinforced habits and reflex behavior) that aren’t doing the world-modeling or future-steering thing. But if we want to do better, we mostly shouldn’t be leaning on weak reasoning tools like pocket calculators; we should be focusing our efforts on more reliably using (and providing better training data) the AGI inside our brains. Nearly all of the action (especially in hard foresight-demanding domains like AI alignment) is in improving your inner AGI’s judgment, intuitions, etc., not in outsourcing to things that are way less smart than an AGI.
In practice, of course, you should do some combination of the two. But I think a lot of the disagreements MIRI folks have with other people in the existential risk ecosystem are related to us falling on different parts of the prosthesis-to-strength-training spectrum.
Techniques that give the illusion of objectivity are usually not useless. But to use them effectively, you have to see through the illusion of objectivity, and treat their outputs as observations of what those techniques output, rather than as glimpses at the light of objective reasonableness.
(My comment is quite critical, but I want to make it clear that I think doing this exercise is great and important, despite my disagreement with the result of the exercise ;) )
So, for having done the same exercise, I feel that you go far too meta here. And that by doing so, you’re losing most of the actual valuable meta insights of the post. I’m not necessarily saying that your interpretation doesn’t fit what Yudkowsky says, but if the goal is to distill where Yudkowsky is coming from in this specific post, I feel like this comment fails.
The “trick that never works”, in general form, is to go looking in epistemology-space for some grounding in objective reality, which will systematically tend to lead you into these illusory traps.
AFAIU, Yudkowsky is not at all arguing against searching for grounding in reality, he’s arguing for a very specific grounding in reality that I’ve been calling deep knowledge in my post interpreting him on the topic. He’s arguing that there are ways to go beyond the agnosticism of Science (which is very similar to the agnosticism of the outside view and reference class forecasting) between hypotheses that haven’t been falsified yet, and let you move towards the true answer despite the search space being far too large to tractably explore. (See that section in particular of my post, where I go into a lot of details about Yudkowsky’s writing on that in the Sequences).
I also feel like your interpretation conflates the errors that Humbali makes and the ones Simulated-OpenPhil makes, but they’re different in my understanding:
Humbali keeps on criticising Yudkowsky’s confidence, and is the representative of the bad uses of the outside view and reference class forecasting. Which is why a lot of the answers to Humbali focus on deep knowledge (which Yudkowsky refer to here with the extended metaphor of the rails), where it comes from, and why it lets you discard some hypotheses (which is the whole point)
Simulated-OpenPhil mostly defend their own approach and the fact that you can use biological anchors to reason about timelines if you do it carefully. The answer Yudkowsky gives is IMO that they don’t have/give a way of linking the path of evolution in search space and the path of human research in search space, and as such more work and more uncertainty handling on evolution and the other biological anchors don’t give you more information about AGI timelines. The only thing you get out of evolution and biological anchors is the few bits that Yudkowsky already integrates in his model (like that humans will need less optimization power because they’re smarter than evolution), which are not enough to predict timelines.
If I had to state it (and I will probably go into more detail in the post I’m currently writing), my interpretation is that the trick that never works is “using a biological analogy that isn’t closely connected to how human research is optimizing for AGI”. So the way of making a “perpetual motion machine” would be to explain why the specific path of evolution (or other anchors) is related to the path of human optimization, and derive stuff from this.
Heartened by a strong-upvote for my attempt at condensing Eliezer’s object-level claim about timeline estimates, I shall now attempt condensing Eliezer’s meta-level “core thing”.
Certain epistemic approaches to arrive at object-level knowledge consistently look like a good source of grounding in reality, especially to people who are trying to be careful about epistemics, and yet such approaches’ grounding in reality is consistently illusory.
Specific examples mentioned in the post are “the outside view”, “reference class forecasting”, “maximum entropy”, “the median of what I remember credible people saying”, and, most importantly for the object-level but least importantly for the Core Thing, “Drake-equation-style approaches that cleanly represent the unknown of interest as a deterministic function of simpler-seeming variables”.
Specific non-examples are concrete experimental observations (falling objects, celestial motion). These have grounding in reality, but they don’t tend to feel like they’re “objective” in the same way, like a nexus that my beliefs are epistemically obligated to move toward—they just feel like a part of my map that isn’t confused. (If experiments do start to feel “objective”, one is then liable to mistake empirical frequencies for probabilities.)
The illusion of grounding in reality doesn’t have to be absolute (i.e. “this method reliably arrives at optimal beliefs”) to be absolutely corrupting (e.g. “this prior, which is not relative to anything in particular, is uniquely privileged and central, even if the optimal beliefs are somewhere nearby-but-different”).
The “trick that never works”, in general form, is to go looking in epistemology-space for some grounding in objective reality, which will systematically tend to lead you into these illusory traps.
Instead of trying to repress your subjective ignorance by shackling it to something objective, you should:
sublimate your subjective ignorance into quantitative probability measures,
use those to make predictions and design experiments, and finally
freely and openly absorb observations into your subjective mind and make subjective updates.
Eliezer doesn’t seem to be saying the following [edit: at least not in my reading of this specific post], but I would like to add:
Even just trying to make your updates objective (e.g. by using a computer to perform an exact Bayesian update) tends to go subtly wrong, because it can encourage you to replace your actual map with your map of your map, which is predictably less informative. Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.
Calibration training is useful because your actual map is also predictably systematically bad at updating by default, and calibration training makes it better at doing this. Teaching the low-description-length principles of probability to your actual map-updating system is much more feasible (or at least more cost-effective) than emitting your actual map into a computationally realizable statistical model.
Techniques that give the illusion of objectivity are usually not useless. But to use them effectively, you have to see through the illusion of objectivity, and treat their outputs as observations of what those techniques output, rather than as glimpses at the light of objective reasonableness.
In the particular example of forecasting AGI with biological anchors, Eliezer does this when he predicts (correctly, at least in the fictional dialogue) that the technique (perhaps especially when operated by people who are trying to be careful and objective) will output a median 30 years from the present.
It’s only because Eliezer can predict the outcome, and finds it almost uncorrelated in his map from AGI’s actual arrival, that he dismisses the estimate as useless.
This particular example as a vehicle for the Core Thing (if I’m right about what that is) has the advantage that the illusion of objectivity is especially illusory (at least from Eliezer’s perspective), but the disadvantage that one can almost read Eliezer as condemning ever using Drake-equation-style approaches, or reference-class forecasting, or the principle of maximum entropy. But I think the general point is about how to undistortedly view the role of these kinds of things in one’s epistemic journey, which in most cases doesn’t actually exclude using them.
Sounds to me like one of the things Eliezer is pointing at in Hero Licensing:
You do want to train your brain, and you want to understand your strengths and weaknesses. But dwelling on your biases at the expense of the object level isn’t actually usually the best way to give your brain training data and tweak its performance.
I think there’s a lesson here that, e.g., Scott Alexander hadn’t fully internalized as of his 2017 Inadequate Equilibria review. There’s a temptation to “go meta” and find some cleaner, more principled, more objective-sounding algorithm to follow than just “learn lots and lots of object-level facts so you can keep refining your model, learn some facts about your brain too so you can know how much to trust it in different domains, and just keep doing that”.
But in fact there’s no a priori reason to expect there to be a shortcut that lets you skip the messy unprincipled your-own-perspective-privileging Bayesian Updating thing. Going meta is just a tool in the toolbox, and it’s risky to privilege it on ‘sounds more objective/principled’ grounds when there’s neither a theoretical argument nor an empirical-track-record argument for expecting that approach to actually work.
I think this is a good distillation of Eliezer’s view (though I know you’re just espousing your own view here). And of mine, for that matter. Quoting Hero Licensing again:
One framing I use is that there are two basic perspectives on rationality:
Prosthesis: Human brains are naturally bad at rationality, so we can identify external tools (and cognitive tech that’s too simple and straightforward for us to misuse) and try to offload as much of our reasoning as possible onto those tools, so as to not have to put weight down (beyond the bare minimum necessary) on our own fallible judgment.
Strength training: There’s a sense in which every human has a small AGI (or a bunch of AGIs) inside their brain. If we didn’t have access to such capabilities, we wouldn’t be able to do complicated ‘planning and steering of the world into future states’ at all.
It’s true that humans often behave ‘irrationally’, in the sense that we output actions based on simpler algorithms (e.g., reinforced habits and reflex behavior) that aren’t doing the world-modeling or future-steering thing. But if we want to do better, we mostly shouldn’t be leaning on weak reasoning tools like pocket calculators; we should be focusing our efforts on more reliably using (and providing better training data) the AGI inside our brains. Nearly all of the action (especially in hard foresight-demanding domains like AI alignment) is in improving your inner AGI’s judgment, intuitions, etc., not in outsourcing to things that are way less smart than an AGI.
In practice, of course, you should do some combination of the two. But I think a lot of the disagreements MIRI folks have with other people in the existential risk ecosystem are related to us falling on different parts of the prosthesis-to-strength-training spectrum.
Strong agreement. I think this is very well-put.
This is good enough that it should be a top level post.
The prosthesis vs. strength training dichotomy in particular seems extremely important.
(My comment is quite critical, but I want to make it clear that I think doing this exercise is great and important, despite my disagreement with the result of the exercise ;) )
So, for having done the same exercise, I feel that you go far too meta here. And that by doing so, you’re losing most of the actual valuable meta insights of the post. I’m not necessarily saying that your interpretation doesn’t fit what Yudkowsky says, but if the goal is to distill where Yudkowsky is coming from in this specific post, I feel like this comment fails.
AFAIU, Yudkowsky is not at all arguing against searching for grounding in reality, he’s arguing for a very specific grounding in reality that I’ve been calling deep knowledge in my post interpreting him on the topic. He’s arguing that there are ways to go beyond the agnosticism of Science (which is very similar to the agnosticism of the outside view and reference class forecasting) between hypotheses that haven’t been falsified yet, and let you move towards the true answer despite the search space being far too large to tractably explore. (See that section in particular of my post, where I go into a lot of details about Yudkowsky’s writing on that in the Sequences).
I also feel like your interpretation conflates the errors that Humbali makes and the ones Simulated-OpenPhil makes, but they’re different in my understanding:
Humbali keeps on criticising Yudkowsky’s confidence, and is the representative of the bad uses of the outside view and reference class forecasting. Which is why a lot of the answers to Humbali focus on deep knowledge (which Yudkowsky refer to here with the extended metaphor of the rails), where it comes from, and why it lets you discard some hypotheses (which is the whole point)
Simulated-OpenPhil mostly defend their own approach and the fact that you can use biological anchors to reason about timelines if you do it carefully. The answer Yudkowsky gives is IMO that they don’t have/give a way of linking the path of evolution in search space and the path of human research in search space, and as such more work and more uncertainty handling on evolution and the other biological anchors don’t give you more information about AGI timelines. The only thing you get out of evolution and biological anchors is the few bits that Yudkowsky already integrates in his model (like that humans will need less optimization power because they’re smarter than evolution), which are not enough to predict timelines.
If I had to state it (and I will probably go into more detail in the post I’m currently writing), my interpretation is that the trick that never works is “using a biological analogy that isn’t closely connected to how human research is optimizing for AGI”. So the way of making a “perpetual motion machine” would be to explain why the specific path of evolution (or other anchors) is related to the path of human optimization, and derive stuff from this.