I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.
I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI. Most of my writing before mid 2023 is not representative of my current views about alignment difficulty.
I like this reason to be unsatisfied with the EUM theory of agency.
One of the difficulties in theorising about agency is that all the theories are flexible enough to explain anything. Each theory is incomplete and vague in some way, so this makes the problem worse, but even when you make a detailed model of e.g. active inference, it ends up being pretty much formally equivalent to EUM.
I think the solution to this is to compare theories using engineering desiderata. Our goal is ultimately to build a safe AGI, so we want a theory that helps us reason about safety desiderata.
One of the really important safety desiderata is some kind of goal stability. When we build a powerful agent, we don’t want it to change its mind about what’s important. It should act to achieve known, predictable outcomes, even when it discovers facts and concepts we don’t know about.
So my criticism of this research direction is that I don’t think it’ll be a good framework for making goal-stable agents. You want a framework that naturally models internal conflict of goals, and in particular you want to model this as conflict between agents. Conflict and cooperation between bounded, not-quite-rational agents is messy and hard to predict. Multi-agent systems are complex and detail dependent. Therefore it seems difficult to show that the overall agent will be stable.
(A reasonable response would be “but no proposed vague theories of bounded agency have this goal stability property, maybe this coalitional approach will turn out to help us come up with a solution”, and that’s true and fair enough, but I think research directions like this seem more promising).