Vladimir_Nesov comments on How would Logical Decision Theories address the Psychopath Button?

Vladimir_Nesov 8 Aug 2022 8:59 UTC
5 points
0
(The second paragraph was irrelevant to the comment I was replying to, I thought the “incidentally”, and the inverted-in-context “it’s obviously relevant” (it’s maximization of EV that’s obviously relevant, unlike the objections to it I’m voicing; maybe this was misleading) made that framing clear?)

I was commenting on how “having the best EV”, the classical dream of decision theory, is recently in question because of the Goodhart’s Curse issue. That it might be good to look for decision theories that do something else. The wrapper-minds post is pointing at the same problem from a very different framing. Mild optimization is a sketch of the kind of thing that might make it better, and includes more specific suggestions like quantilization. (I currently like “moral updatelessness” for this role, a variant of UDT that bargains from a position of moral ignorance, not just epistemic ignorance, among its more morally competent successors, with mutually counterfactual, that is discordant, but more developed moralities/values/goals.) The “coherent decisions” post is just a handy reference for why EV maximization is the standard go-to thing, and might still remain as such in the limit of reflection (time), but possibly not even then.

The relevant part (to the “saner CDT” point) is the first paragraph, which is mostly about Troll Bridge and logical decision theory. Last post of the sequence has a summary/retrospective. Personally, I mostly like CDT for introducing surgery, fictional laws-of-physics-defying counterfactuals seem inescapable in some framings that are not just being dumb like vanilla CDT. In particular, when considering interventions through approximate predictions of the agent. (How do you set all of these to some possible decision, when all you know is the real world, which might have the actual decision you didn’t make yet in its approximate models of you? You might need to “lie” in the counterfactual with fictional details to make models of your behavior created by others predict what you are considering doing, instead of what you actually do and can’t predict or infer from actual models they’ve already made of you. Similarly to how you know a Chess AI will win, without knowing how, you know that models of your behavior will predict it, without knowing how. So you are not inferring their predictions from their details, you are just editing them in into a counterfactual.) This might even be relevant to CEV in that moral updatelessness setting I’ve mentioned, though that’s pure speculation at this point.