Unlike humans, AI systems would be able to cause butterfly effects on purpose, and could channel their impact through butterfly effects if they are not penalized.
Indeed—a point I think is illustrated by the Chaotic Hurricanes test case. I’m probably most excited about methods that would use transparency techniques to determine when a system is deliberately optimising for a part of the world (e.g. the members of the long-term future population) that we don’t want it to care about, but this has a major drawback of perhaps requiring multiple philosophical advances into the meaning of reference in cognition and a greater understanding of what optimisation is.
What would you predict AUP does for the chaotic scenarios? Suppose the attainable set just includes the survival utility function, which is 1 if the agent is activated and 0 otherwise.
Indeed—a point I think is illustrated by the Chaotic Hurricanes test case. I’m probably most excited about methods that would use transparency techniques to determine when a system is deliberately optimising for a part of the world (e.g. the members of the long-term future population) that we don’t want it to care about, but this has a major drawback of perhaps requiring multiple philosophical advances into the meaning of reference in cognition and a greater understanding of what optimisation is.
What would you predict AUP does for the chaotic scenarios? Suppose the attainable set just includes the survival utility function, which is 1 if the agent is activated and 0 otherwise.