‘X wants to do Y’ means that the specific features of X tend to result in Y when its causal influence is relatively large and direct. But, for clarity’s sake, we adopt the convention of only dropping into want-speak when a system is too complicated for us to easily grasp in mechanistic terms why it’s having these complex effects
Yes, agreed, for some fuzzy notion of “easily grasp” and “too complicated.” That is, there’s a sense in which thunderstorms are too complicated for me to describe in mechanistic terms why they’re having the effects they have… I certainly can’t predict those effects. But there’s also a sense in which I can describe (and even predict) the effects of a thunderstorm that feels simple, whereas I can’t do the same thing for a human being without invoking “want-speak”/intentional stance.
I’m not sure any of this is [i]justified[/i], but I agree that it is what we do… this is how we speak, and we draw these distinctions. So far, so good.
if the possible worlds we care about all have a certain environmental feature in common [..] we may, in effect, include something about the environment ‘in the AI’
I’m not really sure what you mean by “in the AI” here, but I guess I agree that the boundary between an agent and its environment is always a fuzzy one. So, OK, I suppose we can include things about the environment “in the AI” if we choose. (I can similarly choose to include things about the environment “in myself.”) So far, so good.
we might model the AI as having the preference ‘surround the Sun with a dyson sphere’ rather than ‘conditioned on there being a Sun, surround it with a dyson sphere’; if we do the former, then the fact that that is the system’s preference depends in part on the actual existence of the Sun.
Here is where you lose me again… once again you talk as though there’s simply no fact of the matter as to which preference the AI has, merely our choice as to how we model it.
But it seems to me that there are observations I can make which would provide evidence one way or the other. For example, if it has the preference ‘surround the Sun with a dyson sphere,’ then in an environment lacking the Sun I would expect it to first seek to create the Sun… how else can it implement its preferences? Whereas if it has the preference ‘conditioned on there being a Sun, surround it with a dyson sphere’; in an environment lacking the Sun I would not expect it to create the Sun.
So does the AI seek create the Sun in such an environment, or not? Surely that doesn’t depend on how I choose to model it. The AI’s preference is whatever it is, and controls its behavior. Of course, as you say, if the real world always includes a sun, then I might not be able to tell which preference the AI has. (Then again I might… the test I describe above isn’t the only test I can perform, just the first one I thought of, and other tests might not depend on the Sun’s absence.)
But whether I can tell or not doesn’t affect whether the AI has the preference or not.
if we do the former, then the fact that that is the system’s preference depends in part on the actual existence of the Sun
Again, no. Regardless of how we model it, the system’s preference is what it is, and we can study the system (e.g., see whether it creates the Sun) to develop more accurate models of its preferences.
Does that mean the Sun is a part of the AI’s preference encoding? Is the Sun a component of the AI? I don’t think these questions are important or interesting
I agree. But I do think the question of what the AI (or, more generally, an optimizing agent) will do in various situations is interesting, and it seems to be that you’re consistently eliding over that question in ways I find puzzling.
Yes, agreed, for some fuzzy notion of “easily grasp” and “too complicated.” That is, there’s a sense in which thunderstorms are too complicated for me to describe in mechanistic terms why they’re having the effects they have… I certainly can’t predict those effects. But there’s also a sense in which I can describe (and even predict) the effects of a thunderstorm that feels simple, whereas I can’t do the same thing for a human being without invoking “want-speak”/intentional stance.
I’m not sure any of this is [i]justified[/i], but I agree that it is what we do… this is how we speak, and we draw these distinctions. So far, so good.
I’m not really sure what you mean by “in the AI” here, but I guess I agree that the boundary between an agent and its environment is always a fuzzy one. So, OK, I suppose we can include things about the environment “in the AI” if we choose. (I can similarly choose to include things about the environment “in myself.”) So far, so good.
Here is where you lose me again… once again you talk as though there’s simply no fact of the matter as to which preference the AI has, merely our choice as to how we model it.
But it seems to me that there are observations I can make which would provide evidence one way or the other. For example, if it has the preference ‘surround the Sun with a dyson sphere,’ then in an environment lacking the Sun I would expect it to first seek to create the Sun… how else can it implement its preferences? Whereas if it has the preference ‘conditioned on there being a Sun, surround it with a dyson sphere’; in an environment lacking the Sun I would not expect it to create the Sun.
So does the AI seek create the Sun in such an environment, or not? Surely that doesn’t depend on how I choose to model it. The AI’s preference is whatever it is, and controls its behavior. Of course, as you say, if the real world always includes a sun, then I might not be able to tell which preference the AI has. (Then again I might… the test I describe above isn’t the only test I can perform, just the first one I thought of, and other tests might not depend on the Sun’s absence.)
But whether I can tell or not doesn’t affect whether the AI has the preference or not.
Again, no. Regardless of how we model it, the system’s preference is what it is, and we can study the system (e.g., see whether it creates the Sun) to develop more accurate models of its preferences.
I agree. But I do think the question of what the AI (or, more generally, an optimizing agent) will do in various situations is interesting, and it seems to be that you’re consistently eliding over that question in ways I find puzzling.