This definition of a non-obstructionist AI takes what would happen if it wasn’t switched on as the base case.
This can give weird infinite hall of mirrors effects if another very similar non-obstructionist AI would have been switched on, and another behind them. (Ie a human whose counterfactual behaviour on AI failure is to reboot and try again.) This would tend to lead to a kind of fixed point effect, where the attainable utility landscape is almost identical with the AI on and off. At some point it bottoms out when the hypothetical U utility humans give up and do something else. If we assume that the AI is at least weakly trying to maximize attainable utility, then several hundred levels of counterfactuals in, the only hypothetical humans that haven’t given up are the ones that really like trying again and again at rebooting the non-obstructionist AI. Suppose the AI would be able to satisfy that value really well. So the AI will focus on the utility functions that are easy to satisfy in other ways, and those that would obstinately keep rebooting in the hypothetical where the AI kept not turning on. (This might be complete nonsense. It seems to make sense to me)
Thanks for leaving this comment. I think this kind of counterfactual is interesting as a thought experiment, but not really relevant to conceptual analysis using this framework. I suppose I should have explained more clearly that the off-state counterfactual was meant to be interpreted with a bit of reasonableness, like “what would we reasonably do if we, the designers, tried to achieve goals using our own power?”. To avoid issues of probable civilizational extinction by some other means soon after without the AI’s help, just imagine that you time-box the counterfactual goal pursuit to, say, a month.
I can easily imagine what my (subjective) attainable utility would be if I just tried to do things on my own, without the AI’s help. In this counterfactual, I’m not really tempted to switch on similar non-obstructionist AIs. It’s this kind of counterfactual that I usually consider for AU landscape-style analysis, because I think it’s a useful way to reason about how the world is changing.
This definition of a non-obstructionist AI takes what would happen if it wasn’t switched on as the base case.
This can give weird infinite hall of mirrors effects if another very similar non-obstructionist AI would have been switched on, and another behind them. (Ie a human whose counterfactual behaviour on AI failure is to reboot and try again.) This would tend to lead to a kind of fixed point effect, where the attainable utility landscape is almost identical with the AI on and off. At some point it bottoms out when the hypothetical U utility humans give up and do something else. If we assume that the AI is at least weakly trying to maximize attainable utility, then several hundred levels of counterfactuals in, the only hypothetical humans that haven’t given up are the ones that really like trying again and again at rebooting the non-obstructionist AI. Suppose the AI would be able to satisfy that value really well. So the AI will focus on the utility functions that are easy to satisfy in other ways, and those that would obstinately keep rebooting in the hypothetical where the AI kept not turning on. (This might be complete nonsense. It seems to make sense to me)
Thanks for leaving this comment. I think this kind of counterfactual is interesting as a thought experiment, but not really relevant to conceptual analysis using this framework. I suppose I should have explained more clearly that the off-state counterfactual was meant to be interpreted with a bit of reasonableness, like “what would we reasonably do if we, the designers, tried to achieve goals using our own power?”. To avoid issues of probable civilizational extinction by some other means soon after without the AI’s help, just imagine that you time-box the counterfactual goal pursuit to, say, a month.
I can easily imagine what my (subjective) attainable utility would be if I just tried to do things on my own, without the AI’s help. In this counterfactual, I’m not really tempted to switch on similar non-obstructionist AIs. It’s this kind of counterfactual that I usually consider for AU landscape-style analysis, because I think it’s a useful way to reason about how the world is changing.