To the extent that reinforcement models could damage the world or become a self-replicating plague, they will do so much earlier in the takeoff when given direct aligned reward for doing so.
To the extent that reinforcement models could damage the world or become a self-replicating plague, they will do so much earlier in the takeoff when given direct aligned reward for doing so.