And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work [...] which I’d encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there’s a strong reason for a system optimizing those objectives to get to the states that give them many more options.
Do you understand the main theorems in that paper and for what environments they are applicable? (My impression is that very few people do, even though the work has been highly praised within the AI alignment community.)
Hey there!
Do you understand the main theorems in that paper and for what environments they are applicable? (My impression is that very few people do, even though the work has been highly praised within the AI alignment community.)
[EDIT: for more context see this comment.]