I’m confused about how to do that because I tend to think of self-modification as happening when the agent is limited and can’t foresee all the consequences of a policy, especially policies that involve making itself smarter. But I suspect that even if you figure out a non-confusing way to talk about risk aversion for limited agents that doesn’t look like actions on some level, you’ll get weird behavior under self-modification, like an update rule that privileges the probability distribution you had at the time you decided to self-modify.
I’m confused about how to do that because I tend to think of self-modification as happening when the agent is limited and can’t foresee all the consequences of a policy, especially policies that involve making itself smarter. But I suspect that even if you figure out a non-confusing way to talk about risk aversion for limited agents that doesn’t look like actions on some level, you’ll get weird behavior under self-modification, like an update rule that privileges the probability distribution you had at the time you decided to self-modify.