Alex Flint comments on Counterarguments to the basic AI x-risk case

Alex Flint 7 Nov 2022 15:15 UTC
LW: 2 AF: 1
0
AF

I expect you could build a system like this that reliably runs around and tidies your house say, or runs your social media presence, without it containing any impetus to become a more coherent agent (because it doesn’t have any reflexes that lead to pondering self-improvement in this way).

I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.

I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system that carefully balances itself and the whole world in such a way that does not slip down the evolutionary slope towards pursuing psychopathic goals by any means necessary?

I think this balancing is possible, it’s just not the default attractor, and the default attractor seems to have a huge basin.