I agree that if you had a handle on accessing average optimal value then you’d be making headway.
I don’t think it covers everything, since e.g. safety / integrity of deliberation / etc. are also important, and because instrumental values aren’t quite clean enough (e.g. even if AI safety was super easy these agents would only work on the version that was useful for optimizing values from the mixture used).
But my bigger Q is how to make headway on accessing average optimal value, and whether we’re able to make the problem easier by focusing on average optimal value.
I agree that if you had a handle on accessing average optimal value then you’d be making headway.
I don’t think it covers everything, since e.g. safety / integrity of deliberation / etc. are also important, and because instrumental values aren’t quite clean enough (e.g. even if AI safety was super easy these agents would only work on the version that was useful for optimizing values from the mixture used).
But my bigger Q is how to make headway on accessing average optimal value, and whether we’re able to make the problem easier by focusing on average optimal value.