it also acts as if it believed the message wasn’t read (note this doesn’t mean that it believes it!)
So… you want to introduce, as a feature, the ability to believe one thing but act as if you believe something else? That strikes me as a remarkably bad idea. For one thing, people with such a feature tend to end up in psychiatric wards.
I haven’t thought hard about Stuart’s ideas, so this may or may not have any relevance to them; but it’s at least arguable that it’s really common (even outside psychiatric wards) for explicit beliefs and actions to diverge. A standard example: many Christians overtly believe that when Christians die they enter into a state of eternal infinite bliss, and yet treat other people’s deaths as tragic and try to avoid dying themselves.
Yes, though I have not thought deeply (hat tip to Jonah :-D) about them.
The idea of decoupling AI beliefs from AI actions looks bad to me on its face. I expect it to introduce a variety of unpleasant failure modes (“of course I fully believe in CEV, it’s just that I’m going to act differently...”) and general fragility. And even if one of utility functions is “do not care about anything but miracles” I still think it’s just going to lead to a catatonic state, is all.
So… you want to introduce, as a feature, the ability to believe one thing but act as if you believe something else? That strikes me as a remarkably bad idea. For one thing, people with such a feature tend to end up in psychiatric wards.
I haven’t thought hard about Stuart’s ideas, so this may or may not have any relevance to them; but it’s at least arguable that it’s really common (even outside psychiatric wards) for explicit beliefs and actions to diverge. A standard example: many Christians overtly believe that when Christians die they enter into a state of eternal infinite bliss, and yet treat other people’s deaths as tragic and try to avoid dying themselves.
Have you read the two article I linked to, explaining the general principle?
Yes, though I have not thought deeply (hat tip to Jonah :-D) about them.
The idea of decoupling AI beliefs from AI actions looks bad to me on its face. I expect it to introduce a variety of unpleasant failure modes (“of course I fully believe in CEV, it’s just that I’m going to act differently...”) and general fragility. And even if one of utility functions is “do not care about anything but miracles” I still think it’s just going to lead to a catatonic state, is all.