The main thing I want to point out that this is an idealized notion of non-idealized decision theory—in other words, it’s still pretty useless to me as a bounded agent, without some advice about how to approximate it. I can’t very well turn into this max-expected-value bounded policy.
But there are other barriers, too. Figuring out what utility function I endorse is a hard problem. And we face challenges of embedded decision theory; how do we reason about the counterfactuals of changing our policy to the better one?
Modulo those concerns, I do think your description is roughly right, and carries some important information about what it means to self-modify in a justified way rather than cargo-culting.
The main thing I want to point out that this is an idealized notion of non-idealized decision theory—in other words, it’s still pretty useless to me as a bounded agent, without some advice about how to approximate it. I can’t very well turn into this max-expected-value bounded policy.
But there are other barriers, too. Figuring out what utility function I endorse is a hard problem. And we face challenges of embedded decision theory; how do we reason about the counterfactuals of changing our policy to the better one?
Modulo those concerns, I do think your description is roughly right, and carries some important information about what it means to self-modify in a justified way rather than cargo-culting.