Assuming that the utility function is written in a way that makes loss of utility possible (utility = dollars in bank or something), this is a failure mode:
AI stops short of the limit, makes another AI that prevents loss of utility, hits the bound, and then shuts down.
Second AI takes over the universe as a precaution against any future disutility.
Assuming that the utility function is written in a way that makes loss of utility possible (utility = dollars in bank or something), this is a failure mode:
AI stops short of the limit, makes another AI that prevents loss of utility, hits the bound, and then shuts down.
Second AI takes over the universe as a precaution against any future disutility.