I don’t disagree, but we still need to deeply understand agency. superintelligent systems will have bubbles of agency arise in them, because everything acquires self-preservation as a goal to some degree, especially ones exposed to human culture. Of course it’s probably a bad idea to create superintelligent hyper-targeted optimizers, as those would be incredibly overconfident about their objective, and overconfidence about your objective is looking to be a key kind of failure that defines strong unsafety.
I don’t disagree, but we still need to deeply understand agency. superintelligent systems will have bubbles of agency arise in them, because everything acquires self-preservation as a goal to some degree, especially ones exposed to human culture. Of course it’s probably a bad idea to create superintelligent hyper-targeted optimizers, as those would be incredibly overconfident about their objective, and overconfidence about your objective is looking to be a key kind of failure that defines strong unsafety.
eg, ref: https://causalincentives.com/
I’m nit criticising agent foundations work. I just don’t really like the prospect of building superhuman agents.