I don’t see what the problem is. You compare ‘what happens if I do X?’ with ‘what happens if I do Y?’ Not, ‘If I do X, what would have happened if I did Y?’
You split the evaluation into cases with constant past decisions, then pick the best outcome. Then the AI doesn’t even notice when it gets into a catch-22 condition.
I don’t see what the problem is. You compare ‘what happens if I do X?’ with ‘what happens if I do Y?’ Not, ‘If I do X, what would have happened if I did Y?’
You split the evaluation into cases with constant past decisions, then pick the best outcome. Then the AI doesn’t even notice when it gets into a catch-22 condition.