how would the AI describe the future? different descriptions of the same future may elicit opposite reactions;
what about things beyond current human understanding? how is the simulated person going to decide whether they are good or bad?
And the new one:
the “this future is going to happen anyway, now I will observe your actions” approach would give high score e.g. to futures that are horrible but everyone who refuses to cooperate with the omnipotent AI will suffer even worse fate (because as long at the threat seems realistic and the AI unstoppable, it makes sense for the simulated person to submit and help)
EDIT: Probably even higher score for futures that are “meh but kinda okay, only everyone who refuses to help (after being explicitly told that refusing to help is punished by horrible torture) is tortured horribly”. The fact that the futures are “kinda okay” and that only people ignoring an explicit warning are tortured, would give an excuse to the simulated person, so fewer of them would choose to become martyrs and thereby provide the −1 vote.
Especially if the simulated person would be told that actually, so far, everyone chose to help, so no one is in fact tortured, but the AI still has a strong precommitment to follow the rules if necessary.
The usual weaknesses:
how would the AI describe the future? different descriptions of the same future may elicit opposite reactions;
what about things beyond current human understanding? how is the simulated person going to decide whether they are good or bad?
And the new one:
the “this future is going to happen anyway, now I will observe your actions” approach would give high score e.g. to futures that are horrible but everyone who refuses to cooperate with the omnipotent AI will suffer even worse fate (because as long at the threat seems realistic and the AI unstoppable, it makes sense for the simulated person to submit and help)
EDIT: Probably even higher score for futures that are “meh but kinda okay, only everyone who refuses to help (after being explicitly told that refusing to help is punished by horrible torture) is tortured horribly”. The fact that the futures are “kinda okay” and that only people ignoring an explicit warning are tortured, would give an excuse to the simulated person, so fewer of them would choose to become martyrs and thereby provide the −1 vote.
Especially if the simulated person would be told that actually, so far, everyone chose to help, so no one is in fact tortured, but the AI still has a strong precommitment to follow the rules if necessary.