Naturally (as an author on that paper), I agree to some extent with this argument.
I think it’s worth pointing out one technical ‘caveat’: the agent should get utility 0 *on all future timesteps* as soon as it takes an action other than the one specified by the policy. We say the agent gets reward 1: “if and only if its history is an element of the set H”, *not* iff “the policy would take action a given history h”. Without this caveat, I think the agent might take other actions in order to capture more future utility (e.g. to avoid terminal states). [Side-note (SN): this relates to a question I asked ~10days ago about whether decision theories and/or policies need to specify actions for impossible histories.]
My main point, however, is that I think you could do some steelmanning here and recover most of the arguments you are criticizing (based on complexity arguments). TBC, I think the thesis (i.e. the title) is a correct and HIGHLY valuable point! But I think there are still good arguments for intelligence strongly suggesting some level of “goal-directed behavior”. e.g. it’s probably physically impossible to implement policies (over histories) that are effectively random, since they look like look-up tables that are larger than the physical universe. So when we build AIs, we are building things that aren’t at that extreme end of the spectrum. Eliezer has a nice analogy in a comment on one of Paul’s posts (I think), about an agent that behaves like it understands math, except that it thinks 2+2=5. You don’t have to believe the extreme version of this view to believe that it’s harder to build agents that aren’t coherent *in a more intuitively meaningful sense* (i.e. closer to caring about states, which is (I think, e.g. see Hutter’s work on state aggregation) equivalent to putting some sort of equivalence relation on histories).
I also want to mention Laurent Orseau’s paper: “Agents and Devices: A Relative Definition of Agency”, which can be viewed as attempting to distinguish “real” agents from things that merely satisfy coherence via the construction in our paper.
I think it’s worth pointing out one technical ‘caveat’
Yes, good point. I think I was assuming an infinite horizon (i.e. no terminal states), for which either construction works.
My main point, however, is that I think you could do some steelmanning here and recover most of the arguments you are criticizing (based on complexity arguments).
That’s the next post in the sequence, though the arguments are different from the ones you bring up.
But I think there are still good arguments for intelligence strongly suggesting some level of “goal-directed behavior”. e.g. it’s probably physically impossible to implement policies (over histories) that are effectively random, since they look like look-up tables that are larger than the physical universe.
I mean, you could have the randomly twitching robot. But I agree with the broader point, I think, to the extent that it is the “economic efficiency” argument in the next post.
Eliezer has a nice analogy in a comment on one of Paul’s posts (I think), about an agent that behaves like it understands math, except that it thinks 2+2=5.
It seems likely the AI’s beliefs would be logically coherent whenever the corresponding human beliefs are logically coherent. This seems quite different from arguing that the AI has a goal.
Yeah it looks like maybe the same argument just expressed very differently? Like, I think the “coherence implies goal-directedness” argument basically goes through if you just consider computational complexity, but I’m still not sure if you agree? (maybe I’m being way to vague)
Or maybe I want a stronger conclusion? I’d like to say something like “REAL, GENERAL intelligence” REQUIRES goal-directed behavior (given the physical limitations of the real world). It seems like maybe our disagreement (if there is one) is around how much departure from goal-directed-ness is feasible / desirable and/or how much we expect such departures to affect performance (the trade-off also gets worse for more intelligent systems).
It seems likely the AI’s beliefs would be logically coherent whenever the corresponding human beliefs are logically coherent. This seems quite different from arguing that the AI has a goal.
Yeah, it’s definitely only an *analogy* (in my mind), but I find it pretty compelling *shrug.
Naturally (as an author on that paper), I agree to some extent with this argument.
I think it’s worth pointing out one technical ‘caveat’: the agent should get utility 0 *on all future timesteps* as soon as it takes an action other than the one specified by the policy. We say the agent gets reward 1: “if and only if its history is an element of the set H”, *not* iff “the policy would take action a given history h”. Without this caveat, I think the agent might take other actions in order to capture more future utility (e.g. to avoid terminal states). [Side-note (SN): this relates to a question I asked ~10days ago about whether decision theories and/or policies need to specify actions for impossible histories.]
My main point, however, is that I think you could do some steelmanning here and recover most of the arguments you are criticizing (based on complexity arguments). TBC, I think the thesis (i.e. the title) is a correct and HIGHLY valuable point! But I think there are still good arguments for intelligence strongly suggesting some level of “goal-directed behavior”. e.g. it’s probably physically impossible to implement policies (over histories) that are effectively random, since they look like look-up tables that are larger than the physical universe. So when we build AIs, we are building things that aren’t at that extreme end of the spectrum. Eliezer has a nice analogy in a comment on one of Paul’s posts (I think), about an agent that behaves like it understands math, except that it thinks 2+2=5. You don’t have to believe the extreme version of this view to believe that it’s harder to build agents that aren’t coherent *in a more intuitively meaningful sense* (i.e. closer to caring about states, which is (I think, e.g. see Hutter’s work on state aggregation) equivalent to putting some sort of equivalence relation on histories).
I also want to mention Laurent Orseau’s paper: “Agents and Devices: A Relative Definition of Agency”, which can be viewed as attempting to distinguish “real” agents from things that merely satisfy coherence via the construction in our paper.
Yes, good point. I think I was assuming an infinite horizon (i.e. no terminal states), for which either construction works.
That’s the next post in the sequence, though the arguments are different from the ones you bring up.
I mean, you could have the randomly twitching robot. But I agree with the broader point, I think, to the extent that it is the “economic efficiency” argument in the next post.
It seems likely the AI’s beliefs would be logically coherent whenever the corresponding human beliefs are logically coherent. This seems quite different from arguing that the AI has a goal.
Yeah it looks like maybe the same argument just expressed very differently? Like, I think the “coherence implies goal-directedness” argument basically goes through if you just consider computational complexity, but I’m still not sure if you agree? (maybe I’m being way to vague)
Or maybe I want a stronger conclusion? I’d like to say something like “REAL, GENERAL intelligence” REQUIRES goal-directed behavior (given the physical limitations of the real world). It seems like maybe our disagreement (if there is one) is around how much departure from goal-directed-ness is feasible / desirable and/or how much we expect such departures to affect performance (the trade-off also gets worse for more intelligent systems).
Yeah, it’s definitely only an *analogy* (in my mind), but I find it pretty compelling *shrug.