Rob Bensinger comments on The genie knows, but doesn’t care

Rob Bensinger 9 Sep 2013 21:56 UTC
0 points

But I don’t think that R-preferences (preferences, goals, etc.) can sensibly be equated with the actual effects a local system has on a global system. If they could, we could talk equally sensibly about earthquakes having R-preferences (preferences, goals, etc.), and I don’t think it’s sensible to talk that way.

You can treat earthquakes and thunderstorms and even individual particles as having ‘preferences’. It’s just not very useful to do so, because we can give an equally simple explanation for what effects things like earthquakes tend to have that is more transparent about the physical mechanism at work. The intentional strategy is a heuristic for black-boxing physical processes that are too complicated to usefully describe in their physical dynamics, but that can be discussed in terms of the complicated outcomes they tend to promote.

(I’d frame it: We’re exploiting the fact that humans are intuitively dualistic by taking the non-physical modeling device of humans (theory of mind, etc.) and appropriating this mental language and concept-web for all sorts of systems whose nuts and bolts we want to bracket. Slightly regimented mental concepts and terms are useful, not because they apply to all the systems we’re talking about in the same way they were originally applied to humans, but because they’re vague in ways that map onto the things we’re uncertain about or indifferent to.)

‘X wants to do Y’ means that the specific features of X tend to result in Y when its causal influence is relatively large and direct. But, for clarity’s sake, we adopt the convention of only dropping into want-speak when a system is too complicated for us to easily grasp in mechanistic terms why it’s having these complex effects, yet when we can predict that, whatever the mechanism happens to be, it is the sort of mechanism that has those particular complex effects.

Thus we speak of evolution as an optimization process, as though it had a ‘preference ordering’ in the intuitively human (i.e., I-preference) sense, even though in the phenomenological sense it’s just as mindless as an earthquake. We do this because black-boxing the physical mechanisms and just focusing on the likely outcomes is often predictively useful here, and because the outcomes are complicated and specific. This is useful for AIs because we care about the AI’s consequences and not its subjectivity (hence we focused on R-preference), and because AIs are optimization processes of even greater complex specificity in mechanism and outcome than evolution (hence we adopted the intentional stance of ‘preference’-talk in the first place).

R-preferences (preferences, goals, etc.) are, rather, internal states of a system S.

I agree this is often the case, because when we define ‘what is this system capable of?’ we often hold the system fixed while examining possible worlds where the environment varies in all kinds of ways. But if the possible worlds we care about all have a certain environmental feature in common—say, because we know in reality that the environmental condition obtains, and we’re trying to figure out all the ways the AI might in fact behave given different values for the variables we don’t know about with confidence—then we may, in effect, include something about the environment ‘in the AI’ for the purposes of assessing its optimization power and/or preference ordering.

For instance, we might model the AI as having the preference ‘surround the Sun with a dyson sphere’ rather than ‘conditioned on there being a Sun, surround it with a dyson sphere’; if we do the former, then the fact that that is the system’s preference depends in part on the actual existence of the Sun. Does that mean the Sun is a part of the AI’s preference encoding? Is the Sun a component of the AI? I don’t think these questions are important or interesting, so I don’t want us to be too committed to reifying AI preferences. They’re just a useful shorthand for the expected outcomes of the AI’s distinguishing features having a more large and direct causal impact on things.
- TheOtherDave 10 Sep 2013 1:50 UTC
  0 points
  Parent
  
  ‘X wants to do Y’ means that the specific features of X tend to result in Y when its causal influence is relatively large and direct. But, for clarity’s sake, we adopt the convention of only dropping into want-speak when a system is too complicated for us to easily grasp in mechanistic terms why it’s having these complex effects
  
  Yes, agreed, for some fuzzy notion of “easily grasp” and “too complicated.” That is, there’s a sense in which thunderstorms are too complicated for me to describe in mechanistic terms why they’re having the effects they have… I certainly can’t predict those effects. But there’s also a sense in which I can describe (and even predict) the effects of a thunderstorm that feels simple, whereas I can’t do the same thing for a human being without invoking “want-speak”/intentional stance.
  
  I’m not sure any of this is [i]justified[/i], but I agree that it is what we do… this is how we speak, and we draw these distinctions. So far, so good.
  
  if the possible worlds we care about all have a certain environmental feature in common [..] we may, in effect, include something about the environment ‘in the AI’
  
  I’m not really sure what you mean by “in the AI” here, but I guess I agree that the boundary between an agent and its environment is always a fuzzy one. So, OK, I suppose we can include things about the environment “in the AI” if we choose. (I can similarly choose to include things about the environment “in myself.”) So far, so good.
  
  we might model the AI as having the preference ‘surround the Sun with a dyson sphere’ rather than ‘conditioned on there being a Sun, surround it with a dyson sphere’; if we do the former, then the fact that that is the system’s preference depends in part on the actual existence of the Sun.
  
  Here is where you lose me again… once again you talk as though there’s simply no fact of the matter as to which preference the AI has, merely our choice as to how we model it.
  
  But it seems to me that there are observations I can make which would provide evidence one way or the other. For example, if it has the preference ‘surround the Sun with a dyson sphere,’ then in an environment lacking the Sun I would expect it to first seek to create the Sun… how else can it implement its preferences? Whereas if it has the preference ‘conditioned on there being a Sun, surround it with a dyson sphere’; in an environment lacking the Sun I would not expect it to create the Sun.
  
  So does the AI seek create the Sun in such an environment, or not? Surely that doesn’t depend on how I choose to model it. The AI’s preference is whatever it is, and controls its behavior. Of course, as you say, if the real world always includes a sun, then I might not be able to tell which preference the AI has. (Then again I might… the test I describe above isn’t the only test I can perform, just the first one I thought of, and other tests might not depend on the Sun’s absence.)
  
  But whether I can tell or not doesn’t affect whether the AI has the preference or not.
  
  if we do the former, then the fact that that is the system’s preference depends in part on the actual existence of the Sun
  
  Again, no. Regardless of how we model it, the system’s preference is what it is, and we can study the system (e.g., see whether it creates the Sun) to develop more accurate models of its preferences.
  
  Does that mean the Sun is a part of the AI’s preference encoding? Is the Sun a component of the AI? I don’t think these questions are important or interesting
  
  I agree. But I do think the question of what the AI (or, more generally, an optimizing agent) will do in various situations is interesting, and it seems to be that you’re consistently eliding over that question in ways I find puzzling.