I know that the humans forced to smile are not happy (and I know all the mistakes they’ve made while programming me, I know what they should’ve done instead), but I don’t believe that they are not happy.
These are different senses of “happy.” It should really read:
I know forcing humans to smile doesn’t make them happyhumane, and I know what they should’ve written instead to get me to optimize for happyhumane as they intended, but they are happysmiley.
They’re different concepts, so there’s no strangeness here. The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn’t care about what you wanted.
I know that there’s no strangeness from the formal point of view. But it doesn’t mean there’s no strangeness in general. Or that the situation isn’t similar to the Moore paradox. Your examples are not 100% Moore statements too. Isn’t the point of the discussion to find interesting connections between Moore paradox and other things?
The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn’t care about what you wanted.
I know that the classical way to formulate it is “AI knows, but doesn’t care”.
I thought it may be interesting to formulate it as “AI knows, but doesn’t believe”. It may be interesting to think for what type of AI this formulation may be true. For such AI alignment would mean resolving the Moore paradox. For example, imagine an AI with a very strong OCD to make people smile.
These are different senses of “happy.” It should really read:
They’re different concepts, so there’s no strangeness here. The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn’t care about what you wanted.
I know that there’s no strangeness from the formal point of view. But it doesn’t mean there’s no strangeness in general. Or that the situation isn’t similar to the Moore paradox. Your examples are not 100% Moore statements too. Isn’t the point of the discussion to find interesting connections between Moore paradox and other things?
I know that the classical way to formulate it is “AI knows, but doesn’t care”.
I thought it may be interesting to formulate it as “AI knows, but doesn’t believe”. It may be interesting to think for what type of AI this formulation may be true. For such AI alignment would mean resolving the Moore paradox. For example, imagine an AI with a very strong OCD to make people smile.