Rob Bensinger comments on Biology-Inspired AGI Timelines: The Trick That Never Works

Rob Bensinger 10 Dec 2021 11:41 UTC
LW: 80 AF: 35
AF
Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.
Sounds to me like one of the things Eliezer is pointing at in Hero Licensing:
Look, thinking things like that is just not how the inside of my head is organized. There’s just the book I have in my head and the question of whether I can translate that image into reality. My mental world is about the book, not about me.
You do want to train your brain, and you want to understand your strengths and weaknesses. But dwelling on your biases at the expense of the object level isn’t actually usually the best way to give your brain training data and tweak its performance.
I think there’s a lesson here that, e.g., Scott Alexander hadn’t fully internalized as of his 2017 Inadequate Equilibria review. There’s a temptation to “go meta” and find some cleaner, more principled, more objective-sounding algorithm to follow than just “learn lots and lots of object-level facts so you can keep refining your model, learn some facts about your brain too so you can know how much to trust it in different domains, and just keep doing that”.
But in fact there’s no a priori reason to expect there to be a shortcut that lets you skip the messy unprincipled your-own-perspective-privileging Bayesian Updating thing. Going meta is just a tool in the toolbox, and it’s risky to privilege it on ‘sounds more objective/principled’ grounds when there’s neither a theoretical argument nor an empirical-track-record argument for expecting that approach to actually work.
Teaching the low-description-length principles of probability to your actual map-updating system is much more feasible (or at least more cost-effective) than emitting your actual map into a computationally realizable statistical model.
I think this is a good distillation of Eliezer’s view (though I know you’re just espousing your own view here). And of mine, for that matter. Quoting Hero Licensing again:
STRANGER: I believe the technical term for the methodology is “pulling numbers out of your ass.” It’s important to practice calibrating your ass numbers on cases where you’ll learn the correct answer shortly afterward. It’s also important that you learn the limits of ass numbers, and don’t make unrealistic demands on them by assigning multiple ass numbers to complicated conditional events.
ELIEZER: I’d say I reached the estimate… by thinking about the object-level problem? By using my domain knowledge? By having already thought a lot about the problem so as to load many relevant aspects into my mind, then consulting my mind’s native-format probability judgment—with some prior practice at betting having already taught me a little about how to translate those native representations of uncertainty into 9:1 betting odds.
One framing I use is that there are two basic perspectives on rationality:
- Prosthesis: Human brains are naturally bad at rationality, so we can identify external tools (and cognitive tech that’s too simple and straightforward for us to misuse) and try to offload as much of our reasoning as possible onto those tools, so as to not have to put weight down (beyond the bare minimum necessary) on our own fallible judgment.
- Strength training: There’s a sense in which every human has a small AGI (or a bunch of AGIs) inside their brain. If we didn’t have access to such capabilities, we wouldn’t be able to do complicated ‘planning and steering of the world into future states’ at all.
  
  It’s true that humans often behave ‘irrationally’, in the sense that we output actions based on simpler algorithms (e.g., reinforced habits and reflex behavior) that aren’t doing the world-modeling or future-steering thing. But if we want to do better, we mostly shouldn’t be leaning on weak reasoning tools like pocket calculators; we should be focusing our efforts on more reliably using (and providing better training data) the AGI inside our brains. Nearly all of the action (especially in hard foresight-demanding domains like AI alignment) is in improving your inner AGI’s judgment, intuitions, etc., not in outsourcing to things that are way less smart than an AGI.
In practice, of course, you should do some combination of the two. But I think a lot of the disagreements MIRI folks have with other people in the existential risk ecosystem are related to us falling on different parts of the prosthesis-to-strength-training spectrum.
Techniques that give the illusion of objectivity are usually not useless. But to use them effectively, you have to see through the illusion of objectivity, and treat their outputs as observations of what those techniques output, rather than as glimpses at the light of objective reasonableness.
Strong agreement. I think this is very well-put.
What links here?
- Rob Bensinger's comment on Beware boasting about non-existent forecasting track records by Jotto999 (25 May 2022 3:35 UTC; 16 points)
- Eli Tyre 12 May 2022 2:24 UTC
  13 points
  Parent
  This is good enough that it should be a top level post.
  
  The prosthesis vs. strength training dichotomy in particular seems extremely important.