Is this relevant for my variant of “indirect normativity” (i.e. allowing human-designed WBEs, no conceptually confusing tricks in constructing the goal definition)?
(I’m generally skeptical about anything that depends on the concept of “blackmail” distinct from general bargaining, as it seems to be hard to formulate the distinction. It seems to be mostly the affect of being bargained against really unfairly, which might happen with a smart opponent if you are bad at bargaining and are vulnerable to accepting unfair deals, so the solution appears to be to figure out how to bargain well, and avoid bargaining against strong opponents until you do figure that out.)
I don’t see how something like this could be a natural problem in my setting, it all seems to depend on how the goal definition (i.e. WBE research team evaluated by the outer AGI) think such issues through, e.g. make sure that they don’t participate in any bargaining when they are not ready, at human level or using poorly-understood tools. Whatever problem you can notice now, they could also notice while being evaluated by the outer AGI, if evaluation of the goal definition does happen, so critical issues for the goal definition are things that can’t be solved at all. It’s more plausible to get decision theory in the goal-evaluating AGI wrong so that it can itself lose bargaining games and end up effectively abandoning goal evaluation, which is a clue in favor of it being important to understand bargaining/blackmail pre-AGI. PM/email me?
I agree that if the WBE team (who already know that a powerful AI exists and that they’re in a simulation) can resist all blackmail, then the problem goes away.
Is this relevant for my variant of “indirect normativity” (i.e. allowing human-designed WBEs, no conceptually confusing tricks in constructing the goal definition)?
(I’m generally skeptical about anything that depends on the concept of “blackmail” distinct from general bargaining, as it seems to be hard to formulate the distinction. It seems to be mostly the affect of being bargained against really unfairly, which might happen with a smart opponent if you are bad at bargaining and are vulnerable to accepting unfair deals, so the solution appears to be to figure out how to bargain well, and avoid bargaining against strong opponents until you do figure that out.)
Yeah, I think it’s relevant for your variant as well.
I don’t see how something like this could be a natural problem in my setting, it all seems to depend on how the goal definition (i.e. WBE research team evaluated by the outer AGI) think such issues through, e.g. make sure that they don’t participate in any bargaining when they are not ready, at human level or using poorly-understood tools. Whatever problem you can notice now, they could also notice while being evaluated by the outer AGI, if evaluation of the goal definition does happen, so critical issues for the goal definition are things that can’t be solved at all. It’s more plausible to get decision theory in the goal-evaluating AGI wrong so that it can itself lose bargaining games and end up effectively abandoning goal evaluation, which is a clue in favor of it being important to understand bargaining/blackmail pre-AGI. PM/email me?
I agree that if the WBE team (who already know that a powerful AI exists and that they’re in a simulation) can resist all blackmail, then the problem goes away.