One thing I think might be worth doing is linking to the post on Realism about Rationality, and explicitly listing at is a potential crux for this post.
I’m pretty onboard theoreticallly with the idea of being a robust agent, but I don’t actually endorse it as a goal because I tend to be a rationality anti-realist.
I actually don’t consider Realism about Rationality cruxy for this (I tried to lay out my own cruxes in this version). Part of what seemed important here is that I think Coherent Agency is only useful in some cases for some people, and I wanted to be clear about when that was.
I think each of the individual properties (gears level understanding, coherence, game-theoretic-soundness) are each just sort of obviously useful in some ways. There are particular failure modes to get trapped in if you’ve only made some incremental progress, but generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.
I do think that the sort of person who naturally gravitates towards this probably has something like ‘rationality realism’ going on, but I suspect it’s not cruxy, and in particular I suspect shouldn’t be cruxy for people who aren’t naturally oriented that way.
Some people are aspiring directly to be a fully coherent, legible, sound agent. And that might be possible or desirable, and it might be possible to reach a variation of that that is cleanly mathematically describable. But I don’t think that has be true for the concept to be useful.
generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.
To me this implies some level on the continuum of realism about rationality. For instance I often think taht to make improvements on life outcomes I have to purposefully go off of pareto improvements in these domaiins, and sometimes sacrifice them. Because I don’t think my brain runs that code natively, and sometimes efficient native code is in direct opposition to naive rationality.
I’ve been watching the discussion on Realism About Rationality with some interest and surprise. I had thought of ‘something like realism about rationality’ as more cruxy for alignment work, because the inspectability of the AI matters a lot more than the inspectability of your own mind – mostly because you’re going to scale up the AI a lot more than your own mind is likely to scale up. The amount of disagreement that’s come out more recently about that has been interesting.
Some of the people who seem most invested in the Coherent Agency thing are specifically trying to operate on cosmic scales (i.e. part of their goal is to capture value in other universes and simulations, and to be the sort of person you could safely upload).
Upon reflection though, I guess it’s not surprising that people don’t consider realism “cruxy” for alignment, and also not “cruxy” for personal agency (i.e. upon reflection, I think it’s more like an aesthetic input, than a crux. It’s not necessary for agency to be mathematically simple or formalized, for incremental legibility and coherence to be useful for avoiding wasted motion)
I like the edits!
One thing I think might be worth doing is linking to the post on Realism about Rationality, and explicitly listing at is a potential crux for this post.
I’m pretty onboard theoreticallly with the idea of being a robust agent, but I don’t actually endorse it as a goal because I tend to be a rationality anti-realist.
I actually don’t consider Realism about Rationality cruxy for this (I tried to lay out my own cruxes in this version). Part of what seemed important here is that I think Coherent Agency is only useful in some cases for some people, and I wanted to be clear about when that was.
I think each of the individual properties (gears level understanding, coherence, game-theoretic-soundness) are each just sort of obviously useful in some ways. There are particular failure modes to get trapped in if you’ve only made some incremental progress, but generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.
I do think that the sort of person who naturally gravitates towards this probably has something like ‘rationality realism’ going on, but I suspect it’s not cruxy, and in particular I suspect shouldn’t be cruxy for people who aren’t naturally oriented that way.
Some people are aspiring directly to be a fully coherent, legible, sound agent. And that might be possible or desirable, and it might be possible to reach a variation of that that is cleanly mathematically describable. But I don’t think that has be true for the concept to be useful.
To me this implies some level on the continuum of realism about rationality. For instance I often think taht to make improvements on life outcomes I have to purposefully go off of pareto improvements in these domaiins, and sometimes sacrifice them. Because I don’t think my brain runs that code natively, and sometimes efficient native code is in direct opposition to naive rationality.
Relatedly:
I’ve been watching the discussion on Realism About Rationality with some interest and surprise. I had thought of ‘something like realism about rationality’ as more cruxy for alignment work, because the inspectability of the AI matters a lot more than the inspectability of your own mind – mostly because you’re going to scale up the AI a lot more than your own mind is likely to scale up. The amount of disagreement that’s come out more recently about that has been interesting.
Some of the people who seem most invested in the Coherent Agency thing are specifically trying to operate on cosmic scales (i.e. part of their goal is to capture value in other universes and simulations, and to be the sort of person you could safely upload).
Upon reflection though, I guess it’s not surprising that people don’t consider realism “cruxy” for alignment, and also not “cruxy” for personal agency (i.e. upon reflection, I think it’s more like an aesthetic input, than a crux. It’s not necessary for agency to be mathematically simple or formalized, for incremental legibility and coherence to be useful for avoiding wasted motion)