Stephen Fowler comments on TurnTrout’s shortform feed

Stephen Fowler 15 Aug 2024 3:19 UTC
3 points
0
Is a difficulty in moving from statements about the variance in logits to statements about x-risk?

One is a statement about the output of a computation after a single timestep, the other is a statement about the cumulative impact of the policy over multiple time-steps in a dynamic environment that reacts in a complex way to the actions taken.
My intuition is that for any $ϵ > 0$ bounding the variance in the logits, you could always construct a suitably pathological environment that will always amplify these cumulative deviations into a catastrophy.
(There is at least a 30% chance I haven’t grasped your idea correctly)