I expect you could build a system like this that reliably runs around and tidies your house say, or runs your social media presence, without it containing any impetus to become a more coherent agent (because it doesn’t have any reflexes that lead to pondering self-improvement in this way).
I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.
I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system that carefully balances itself and the whole world in such a way that does not slip down the evolutionary slope towards pursuing psychopathic goals by any means necessary?
I think this balancing is possible, it’s just not the default attractor, and the default attractor seems to have a huge basin.
I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.
I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system that carefully balances itself and the whole world in such a way that does not slip down the evolutionary slope towards pursuing psychopathic goals by any means necessary?
I think this balancing is possible, it’s just not the default attractor, and the default attractor seems to have a huge basin.