Lumifer comments on An Oracle standard trick

Lumifer 5 Jun 2015 14:35 UTC
0 points

it also acts as if it believed the message wasn’t read (note this doesn’t mean that it believes it!)

So… you want to introduce, as a feature, the ability to believe one thing but act as if you believe something else? That strikes me as a remarkably bad idea. For one thing, people with such a feature tend to end up in psychiatric wards.
- gjm 5 Jun 2015 16:28 UTC
  0 points
  Parent
  I haven’t thought hard about Stuart’s ideas, so this may or may not have any relevance to them; but it’s at least arguable that it’s really common (even outside psychiatric wards) for explicit beliefs and actions to diverge. A standard example: many Christians overtly believe that when Christians die they enter into a state of eternal infinite bliss, and yet treat other people’s deaths as tragic and try to avoid dying themselves.
- Stuart_Armstrong 5 Jun 2015 16:17 UTC
  0 points
  Parent
  Have you read the two article I linked to, explaining the general principle?
  - Lumifer 5 Jun 2015 16:53 UTC
    0 points
    Parent
    Yes, though I have not thought deeply (hat tip to Jonah :-D) about them.
    
    The idea of decoupling AI beliefs from AI actions looks bad to me on its face. I expect it to introduce a variety of unpleasant failure modes (“of course I fully believe in CEV, it’s just that I’m going to act differently...”) and general fragility. And even if one of utility functions is “do not care about anything but miracles” I still think it’s just going to lead to a catatonic state, is all.