You have a good and correct point, but it has nothing to do with your question.
a machine can never halt after achieving its goal because it cannot know with full certainty whether it has achieved its goal
This is a misunderstanding of how such a machine might work.
To verify that it completed the task, the machine must match the current state to the desired state. The desired state is any state where the machine has “made 32 paperclips”. Now what’s a paperclip?
For quite some time we’ve had the technology to identify a paperclip in an image, if one exists. One lesson we’ve learned pretty well is this: don’t overfit. The paperclip you’re going to be tested on is probably not one you’ve seen before. You’ll need to know what features are common in paperclips (and less common in other objects) and how much variability they present. Tolerance to this variability will be necessary for generalization, and this means you can never be sure if you’re seeing a paperclip. In this sense there’s a limit to how well the user can specify the goal.
So after taking a few images of the paperclips it’s made, the machine’s major source of (unavoidable) uncertainty will be “is this what the user meant?”, not “am I really getting a good image of what’s on the table?”. Any half-decent implementation will go do other things (such as go ask the user).
Sure, it will go ask the user too. And do various other things. But it remains true that if it wants to be maximally confident that it has achieved its target state, at no time will it decide that maximal confidence has been achieved and shut down, because there will always be something that it can do to increase (if only by an increasingly small epsilon) its confidence.
Sure, it will go ask the user too. And do various other things. But it remains true that if it wants to be maximally confident that it has achieved its target state, at no time will it decide that maximal confidence has been achieved and shut down, because there will always be something that it can do to increase (if only by an increasingly small epsilon) its confidence.
You have a good and correct point, but it has nothing to do with your question.
This is a misunderstanding of how such a machine might work.
To verify that it completed the task, the machine must match the current state to the desired state. The desired state is any state where the machine has “made 32 paperclips”. Now what’s a paperclip?
For quite some time we’ve had the technology to identify a paperclip in an image, if one exists. One lesson we’ve learned pretty well is this: don’t overfit. The paperclip you’re going to be tested on is probably not one you’ve seen before. You’ll need to know what features are common in paperclips (and less common in other objects) and how much variability they present. Tolerance to this variability will be necessary for generalization, and this means you can never be sure if you’re seeing a paperclip. In this sense there’s a limit to how well the user can specify the goal.
So after taking a few images of the paperclips it’s made, the machine’s major source of (unavoidable) uncertainty will be “is this what the user meant?”, not “am I really getting a good image of what’s on the table?”. Any half-decent implementation will go do other things (such as go ask the user).
Sure, it will go ask the user too. And do various other things.
But it remains true that if it wants to be maximally confident that it has achieved its target state, at no time will it decide that maximal confidence has been achieved and shut down, because there will always be something that it can do to increase (if only by an increasingly small epsilon) its confidence.
Sure, it will go ask the user too. And do various other things.
But it remains true that if it wants to be maximally confident that it has achieved its target state, at no time will it decide that maximal confidence has been achieved and shut down, because there will always be something that it can do to increase (if only by an increasingly small epsilon) its confidence.