The problem is, you have rigged the example to explode and so, naturally enough it exploded.
Specifically: you hypothesise an AI that is given a goal, but a term used in that goal has been left underspecified (by an assumption that you inserted, without explanation … voila! the ticking time bomb), and then you point out that since the term has an underspecified definition, the AI could decide to maximize its performance by adjusting the term definition so as to make the goal real easy to achieve.
Besides which, all definitions are “incomplete”. (See the entire literature on the psychology of concepts)
But notice: real intelligent systems like humans are designed to work very well in the absence of “perfectly complete” definitions of pretty much everything they know. They are not in the least fazed by weak definitions, and they do not habitually go crazy and exploit the weakness of every definition in the universe.
Well, okay, teenagers do that. (“I took out the garbage: look, the wastebasket in my room is empty!”). But apart from that, real humans perform admirably.
As far as I can tell the only AIs that would NOT perform that well, are ones that have been especially constructed to self-destruct. (Hence, my Maverick Nanny paper, and this comment. Same basic point in both cases).
The problem is, you have rigged the example to explode and so, naturally enough it exploded.
Specifically: you hypothesise an AI that is given a goal, but a term used in that goal has been left underspecified (by an assumption that you inserted, without explanation … voila! the ticking time bomb), and then you point out that since the term has an underspecified definition, the AI could decide to maximize its performance by adjusting the term definition so as to make the goal real easy to achieve.
Besides which, all definitions are “incomplete”. (See the entire literature on the psychology of concepts)
But notice: real intelligent systems like humans are designed to work very well in the absence of “perfectly complete” definitions of pretty much everything they know. They are not in the least fazed by weak definitions, and they do not habitually go crazy and exploit the weakness of every definition in the universe.
Well, okay, teenagers do that. (“I took out the garbage: look, the wastebasket in my room is empty!”). But apart from that, real humans perform admirably.
As far as I can tell the only AIs that would NOT perform that well, are ones that have been especially constructed to self-destruct. (Hence, my Maverick Nanny paper, and this comment. Same basic point in both cases).