Dagon comments on What’s Up With Confusingly Pervasive Goal Directedness?

Dagon 20 Jan 2022 21:25 UTC
3 points
I wonder if the confusion isn’t about implications of consequentialism, but about the implications of independent agents. Related to the (often mentioned, but never really addressed) problem that humans don’t have a CEV, and we have competition built-in to our (inconsistent) utility functions.

I have yet to see a model of multiple agents WRT “alignment”. The ONLY reason that power/resources/self-preservation is instrumental is if there are unaligned agents in competition. If multiple agents agree on the best outcomes and the best way to achieve them, then it doesn’t matter which agent does what, or even which agent(s) exist.

Fully-aligned agents are really just multiple processing cores of one agent.
It’s when we talk about partial-alignment that we go off the rails. In this case, we should address competition and tradeoffs as actual things the agent(s) have to consider.