Is a difficulty in moving from statements about the variance in logits to statements about x-risk?
One is a statement about the output of a computation after a single timestep, the other is a statement about the cumulative impact of the policy over multiple time-steps in a dynamic environment that reacts in a complex way to the actions taken.
My intuition is that for any ϵ>0 bounding the variance in the logits, you could always construct a suitably pathological environment that will always amplify these cumulative deviations into a catastrophy.
(There is at least a 30% chance I haven’t grasped your idea correctly)
Is a difficulty in moving from statements about the variance in logits to statements about x-risk?
One is a statement about the output of a computation after a single timestep, the other is a statement about the cumulative impact of the policy over multiple time-steps in a dynamic environment that reacts in a complex way to the actions taken.
My intuition is that for any ϵ>0 bounding the variance in the logits, you could always construct a suitably pathological environment that will always amplify these cumulative deviations into a catastrophy.
(There is at least a 30% chance I haven’t grasped your idea correctly)