Incidentally, this is an increasinglydubiousobjective. But to see why it’s a bad idea in practice, it’s helpful to be aware of the way it looks like a very good idea. (Regardless, it’s obviously relevant for this post.)
OK, I read the last one (again, after all these years), and I have no idea how it is applicable. It seems to be about the definition of probability, dutch-booking and such… nothing to do with the question at hand. The one before that is about how a “wrapper-mind”, i.e. a fixed-goal AGI is bad… Which is indeed correct, but… irrelevant? It has the best EV by its own metric?
(The second paragraph was irrelevant to the comment I was replying to, I thought the “incidentally”, and the inverted-in-context “it’s obviously relevant” (it’s maximization of EV that’s obviously relevant, unlike the objections to it I’m voicing; maybe this was misleading) made that framing clear?)
I was commenting on how “having the best EV”, the classical dream of decision theory, is recently in question because of the Goodhart’s Curse issue. That it might be good to look for decision theories that do something else. The wrapper-minds post is pointing at the same problem from a very different framing. Mild optimization is a sketch of the kind of thing that might make it better, and includes more specific suggestions like quantilization. (I currently like “moral updatelessness” for this role, a variant of UDT that bargains from a position of moral ignorance, not just epistemic ignorance, among its more morally competent successors, with mutually counterfactual, that is discordant, but more developed moralities/values/goals.) The “coherent decisions” post is just a handy reference for why EV maximization is the standard go-to thing, and might still remain as such in the limit of reflection (time), but possibly not even then.
The relevant part (to the “saner CDT” point) is the first paragraph, which is mostly about Troll Bridge and logical decision theory. Last post of the sequence has a summary/retrospective. Personally, I mostly like CDT for introducing surgery, fictional laws-of-physics-defying counterfactuals seem inescapable in some framings that are not just being dumb like vanilla CDT. In particular, when considering interventions through approximate predictions of the agent. (How do you set all of these to some possible decision, when all you know is the real world, which might have the actual decision you didn’t make yet in its approximate models of you? You might need to “lie” in the counterfactual with fictional details to make models of your behavior created by others predict what you are considering doing, instead of what you actually do and can’t predict or infer from actual models they’ve already made of you. Similarly to how you know a Chess AI will win, without knowing how, you know that models of your behavior will predict it, without knowing how. So you are not inferring their predictions from their details, you are just editing them in into a counterfactual.) This might even be relevant to CEV in that moral updatelessness setting I’ve mentioned, though that’s pure speculation at this point.
a fixed-goal AGI is bad… Which is indeed correct, but… irrelevant? It has the best EV by its own metric?
Nobody knows how to formulate it like that! EV maximization is so entrenched as obviously the thing to do that the “obviously, it’s just EV maximization for something else” response is instinctual, but that doesn’t seem to be the case.
And if maximization is always cursed (goals are always proxy goals, even as they become increasingly more accurate, particularly around the actual environment), it’s not maximization that decision theory should be concerned with.
Thanks. I will give them a read. After all, smarter people than me spent more time than I did thinking about this. There is a fair chance that I am missing something.
That’s not clear until you develop them.
Incidentally, this is an increasingly dubious objective. But to see why it’s a bad idea in practice, it’s helpful to be aware of the way it looks like a very good idea. (Regardless, it’s obviously relevant for this post.)
OK, I read the last one (again, after all these years), and I have no idea how it is applicable. It seems to be about the definition of probability, dutch-booking and such… nothing to do with the question at hand. The one before that is about how a “wrapper-mind”, i.e. a fixed-goal AGI is bad… Which is indeed correct, but… irrelevant? It has the best EV by its own metric?
(The second paragraph was irrelevant to the comment I was replying to, I thought the “incidentally”, and the inverted-in-context “it’s obviously relevant” (it’s maximization of EV that’s obviously relevant, unlike the objections to it I’m voicing; maybe this was misleading) made that framing clear?)
I was commenting on how “having the best EV”, the classical dream of decision theory, is recently in question because of the Goodhart’s Curse issue. That it might be good to look for decision theories that do something else. The wrapper-minds post is pointing at the same problem from a very different framing. Mild optimization is a sketch of the kind of thing that might make it better, and includes more specific suggestions like quantilization. (I currently like “moral updatelessness” for this role, a variant of UDT that bargains from a position of moral ignorance, not just epistemic ignorance, among its more morally competent successors, with mutually counterfactual, that is discordant, but more developed moralities/values/goals.) The “coherent decisions” post is just a handy reference for why EV maximization is the standard go-to thing, and might still remain as such in the limit of reflection (time), but possibly not even then.
The relevant part (to the “saner CDT” point) is the first paragraph, which is mostly about Troll Bridge and logical decision theory. Last post of the sequence has a summary/retrospective. Personally, I mostly like CDT for introducing surgery, fictional laws-of-physics-defying counterfactuals seem inescapable in some framings that are not just being dumb like vanilla CDT. In particular, when considering interventions through approximate predictions of the agent. (How do you set all of these to some possible decision, when all you know is the real world, which might have the actual decision you didn’t make yet in its approximate models of you? You might need to “lie” in the counterfactual with fictional details to make models of your behavior created by others predict what you are considering doing, instead of what you actually do and can’t predict or infer from actual models they’ve already made of you. Similarly to how you know a Chess AI will win, without knowing how, you know that models of your behavior will predict it, without knowing how. So you are not inferring their predictions from their details, you are just editing them in into a counterfactual.) This might even be relevant to CEV in that moral updatelessness setting I’ve mentioned, though that’s pure speculation at this point.
Nobody knows how to formulate it like that! EV maximization is so entrenched as obviously the thing to do that the “obviously, it’s just EV maximization for something else” response is instinctual, but that doesn’t seem to be the case.
And if maximization is always cursed (goals are always proxy goals, even as they become increasingly more accurate, particularly around the actual environment), it’s not maximization that decision theory should be concerned with.
Thanks. I will give them a read. After all, smarter people than me spent more time than I did thinking about this. There is a fair chance that I am missing something.