If it likes sitting in a corner and thinking to itself, and doesn’t care about anything else, it is very likely to turn everything around it (including us) into computronium so that it can think to itself better.
If you put a threshold on it to prevent it from doing stuff like that, that’s a little better, but not much. If it has a utility function that says “Think to yourself about stuff, but do not mess up the lives of humans in doing so”, then what you have now is an AI that is motivated to find loopholes in (the implementation of) that second clause, because anything that can get an increased fulfilment of the first clause will give it a higher utility score overall.
You can get more and more precise than that and cover more known failure modes with their own individual rules, but if it’s very intelligent or powerful it’s tough to predict what terrible nasty stuff might still be in the intersection of all the limiting conditions we create. Hidden complexity of wishes and all that jazz.
If it likes sitting in a corner and thinking to itself, and doesn’t care about anything else, it is very likely to turn everything around it (including us) into computronium so that it can think to itself better.
If you put a threshold on it to prevent it from doing stuff like that, that’s a little better, but not much. If it has a utility function that says “Think to yourself about stuff, but do not mess up the lives of humans in doing so”, then what you have now is an AI that is motivated to find loopholes in (the implementation of) that second clause, because anything that can get an increased fulfilment of the first clause will give it a higher utility score overall.
You can get more and more precise than that and cover more known failure modes with their own individual rules, but if it’s very intelligent or powerful it’s tough to predict what terrible nasty stuff might still be in the intersection of all the limiting conditions we create. Hidden complexity of wishes and all that jazz.