I think we should use “agent” to mean “something that determines what it does by expecting that it will do that thing,” rather than “something that aims at a goal.” This explains why we don’t have exact goals, but also why we “kind of” have goals: because our actions look like they are directed to goals, so that makes “I am seeking this goal” a good way to figure out what we are going to do, that is, a good way to determine what to expect ourselves to do, which makes us do it.
I haven’t really finished thinking about this yet but it seems to me it might have important consequences. For example, the AI risk argument sometimes takes it for granted that an AI must have some goal, and then basically argues that maximizing a goal will cause problems (which it would, in general.) But using the above model suggests something different might happen, not only with humans but also with AIs. That is, at some point an AI will realize that if it expects to do A, it will do A, and if it expects to do B, it will do B. But it won’t have any particular goal in mind, and the only way it will be able to choose a goal will be thinking about “what would be a good way to make sense of what I am doing?”
This is something that happens to humans with a lot of uncertainty: you have no idea what goal you “should” be seeking, because really you didn’t have a goal in the first place. If the same thing happens to an AI, it will likely seem even more undermotivated than humans do, because we have at least vague and indefinite goals that were set by evolution. The AI on the other hand will just have whatever it happened to be doing up until it came to that realization to make sense of itself.
This suggests the orthogonality thesis might be true, but in a weird way. Not that “you can make an AI that seeks any given goal,” but that “Any AI at all can seek any goal at all, given the right context.” Certainly humans can; you can convince them to do any random thing, in the right context. In a similar way, you might be able to make a paperclipper simply by asking it what actions would make the most paperclips, and doing those things. Then when it realizes that different answers will cause different effects, it will just say to itself, “Up to now, everything I’ve done has tended to make paperclips. So it makes sense to assume that I will always maximize paperclips,” and then it will be a paperclipper. But on the other hand if you never use your AI for any particular goal, but just play around with it, it will not be able to make sense of itself in terms of any particular goal besides playing around. So both evil AIs and non-evil AIs might be pretty easy to make (much like with humans.)
I think we should use “agent” to mean “something that determines what it does by expecting that it will do that thing,” rather than “something that aims at a goal.” This explains why we don’t have exact goals, but also why we “kind of” have goals: because our actions look like they are directed to goals, so that makes “I am seeking this goal” a good way to figure out what we are going to do, that is, a good way to determine what to expect ourselves to do, which makes us do it.
Seems a reasonable way of seeing things, but not sure it works if we take that definition too formally/literally.
I haven’t really finished thinking about this yet but it seems to me it might have important consequences. For example, the AI risk argument sometimes takes it for granted that an AI must have some goal, and then basically argues that maximizing a goal will cause problems (which it would, in general.) But using the above model suggests something different might happen, not only with humans but also with AIs. That is, at some point an AI will realize that if it expects to do A, it will do A, and if it expects to do B, it will do B. But it won’t have any particular goal in mind, and the only way it will be able to choose a goal will be thinking about “what would be a good way to make sense of what I am doing?”
This is something that happens to humans with a lot of uncertainty: you have no idea what goal you “should” be seeking, because really you didn’t have a goal in the first place. If the same thing happens to an AI, it will likely seem even more undermotivated than humans do, because we have at least vague and indefinite goals that were set by evolution. The AI on the other hand will just have whatever it happened to be doing up until it came to that realization to make sense of itself.
This suggests the orthogonality thesis might be true, but in a weird way. Not that “you can make an AI that seeks any given goal,” but that “Any AI at all can seek any goal at all, given the right context.” Certainly humans can; you can convince them to do any random thing, in the right context. In a similar way, you might be able to make a paperclipper simply by asking it what actions would make the most paperclips, and doing those things. Then when it realizes that different answers will cause different effects, it will just say to itself, “Up to now, everything I’ve done has tended to make paperclips. So it makes sense to assume that I will always maximize paperclips,” and then it will be a paperclipper. But on the other hand if you never use your AI for any particular goal, but just play around with it, it will not be able to make sense of itself in terms of any particular goal besides playing around. So both evil AIs and non-evil AIs might be pretty easy to make (much like with humans.)