I don’t see how something like this could be a natural problem in my setting, it all seems to depend on how the goal definition (i.e. WBE research team evaluated by the outer AGI) think such issues through, e.g. make sure that they don’t participate in any bargaining when they are not ready, at human level or using poorly-understood tools. Whatever problem you can notice now, they could also notice while being evaluated by the outer AGI, if evaluation of the goal definition does happen, so critical issues for the goal definition are things that can’t be solved at all. It’s more plausible to get decision theory in the goal-evaluating AGI wrong so that it can itself lose bargaining games and end up effectively abandoning goal evaluation, which is a clue in favor of it being important to understand bargaining/blackmail pre-AGI. PM/email me?
I agree that if the WBE team (who already know that a powerful AI exists and that they’re in a simulation) can resist all blackmail, then the problem goes away.
Yeah, I think it’s relevant for your variant as well.
I don’t see how something like this could be a natural problem in my setting, it all seems to depend on how the goal definition (i.e. WBE research team evaluated by the outer AGI) think such issues through, e.g. make sure that they don’t participate in any bargaining when they are not ready, at human level or using poorly-understood tools. Whatever problem you can notice now, they could also notice while being evaluated by the outer AGI, if evaluation of the goal definition does happen, so critical issues for the goal definition are things that can’t be solved at all. It’s more plausible to get decision theory in the goal-evaluating AGI wrong so that it can itself lose bargaining games and end up effectively abandoning goal evaluation, which is a clue in favor of it being important to understand bargaining/blackmail pre-AGI. PM/email me?
I agree that if the WBE team (who already know that a powerful AI exists and that they’re in a simulation) can resist all blackmail, then the problem goes away.