timtyler comments on Post Your Utility Function

timtyler 5 Jun 2009 23:14 UTC
0 points
Presumably, if you asked such an agent to reflect on its own purposes, it would claim that they related to the external world (unless it’s aim was to deceive you about its purposes for signalling reasons, of course).

For example, it might claim that its aim was to save the whales—rather than to feel good about saving the whales. It could do the latter by taking drugs or via hypnotherapy—and that is not how it actually acts.
- pjeby 5 Jun 2009 23:28 UTC
  0 points
  Parent
  
  Presumably, if you asked such an agent to reflect on its own purposes, it would claim that they related to the external world (unless it’s aim was to deceive you about its purposes for signalling reasons, of course).
  
  Actually, if signaling was its true purpose, it would claim the same thing. And if it were hacked together by evolution to be convincing, it might even do so by genuinely believing that its reflections were accurate. ;-)
  
  For example, it might claim that its aim was to save the whales—rather than to feel good about saving the whales. It could do the latter by taking drugs or via hypnotherapy—and that is not how it actually acts.
  
  Indeed. But in the case of humans, note first that many people do in fact take drugs to feel good, and second, that we tend to dislike being deceived. When we try to imagine getting hypnotized into believing the whales are safe, we react as we would to being deceived, not as we would if we truly believed the whales were safe. It is this error in the map that gives us a degree of feed-forward consistency, in that it prevents us from certain classes of wireheading.
  
  However, it’s also a source of other errors, because in the case of self-fulfilling beliefs, it leads to erroneous conclusions about our need for the belief. For example, if you think your fear of being fired is the only thing getting you to work at all, then you will be reluctant to give up that fear, even if it’s really the existence of the fear that is suppressing, say, the creativity or ambition that would replace the fear.
  
  In each case, the error is the same: System 2 projection of the future implicitly relies on the current contents of System 1′s map, and does not take into account how that map would be different in the projected future.
  
  (This is why, by the way, The Work’s fourth question is “who would you be without that thought?” The question is a trick to force System 1 to do a projection using the presupposition that the belief is already gone.)