I’m not sure that the agent that constantly twitches is going to be motivated by coherence theorems anyways. Is the class of agents that care about coherence identical to the class of potentially dangerous goal-directed/explicit-utility-maximizing/insert-euphemism-here agents?
In the setting where your outcomes are universe-histories, coherence is vacuous, so nobody cares/doesn’t care about that kind of coherence.
In the setting where you have some sort of contradictory preferences, because your preferences are over more high-level concepts than particular universe-histories, then you probably care about coherence theorems. Seems possible that this is the same as the class of goal-directed behaviors, but even if so I’m not sure what implications that has? Eg. I don’t think it changes anything about the arguments I’m making in this post.
Sorry, this was a good response to my confused take—I promised myself I’d write a response but only ended up doing it now :)
I think the root of my disagreeing-feeling is that when I talk about things like “it cares” or “it values,” I’m in a context where the intentional stance is actually doing useful work—thinking of some system as an agent with wants, plans, goals, etc. is in some cases a useful simplification that helps me better predict the world. This is especially true when I’m just using the words informally—I can talk about the constantly-twitching agent wanting to constantly twitch, when using the words deliberately, but I wouldn’t use this language intuitively, because it doesn’t help me predict anything the physical stance wouldn’t. It might even mislead me, or dilute the usefulness of intentional stance language. This conflict with intuition is a lot of what’s driving my reaction this this argument.
The other half of the issue is that I’m used to thinking of intentional-stance features as having cognitive functions. For example, if I “believe” something, this means that I have some actual physical pattern inside me that performs the function of a world-model, and something like plans, actions, or observations that I check against that world-model. The physical system that constantly twitches can indeed be modeled by an agent with a utility function over world-histories, but that agent is in some sense an incorporeal soul—the physical system itself doesn’t have the cognitive functions associated with intentional-stance attributes (like “caring about coherence”).
Yeah, I agree that the concepts of “goals”, “values”, “wanting”, etc. are useful concepts to have, and point to something real. For those concepts, it is true that the constant-twitching agent does not “want” to constantly twitch, nor does it have it as a “goal”. On the other hand, you can say that humans “want” to not suffer.
I’m not arguing that we should drop these concepts altogether. Separately from this post, I want to make the claim that we can try to build an AI system that does not have “goals”. A common counterargument is that due to coherence theorems any sufficiently advanced AI system will have “goals”. I’m rebutting that counterargument with this post.
(The next couple of posts in the sequence should address this a bit more, I think the sequence is going to resume very soon.)
I’m not sure that the agent that constantly twitches is going to be motivated by coherence theorems anyways. Is the class of agents that care about coherence identical to the class of potentially dangerous goal-directed/explicit-utility-maximizing/insert-euphemism-here agents?
In the setting where your outcomes are universe-histories, coherence is vacuous, so nobody cares/doesn’t care about that kind of coherence.
In the setting where you have some sort of contradictory preferences, because your preferences are over more high-level concepts than particular universe-histories, then you probably care about coherence theorems. Seems possible that this is the same as the class of goal-directed behaviors, but even if so I’m not sure what implications that has? Eg. I don’t think it changes anything about the arguments I’m making in this post.
Sorry, this was a good response to my confused take—I promised myself I’d write a response but only ended up doing it now :)
I think the root of my disagreeing-feeling is that when I talk about things like “it cares” or “it values,” I’m in a context where the intentional stance is actually doing useful work—thinking of some system as an agent with wants, plans, goals, etc. is in some cases a useful simplification that helps me better predict the world. This is especially true when I’m just using the words informally—I can talk about the constantly-twitching agent wanting to constantly twitch, when using the words deliberately, but I wouldn’t use this language intuitively, because it doesn’t help me predict anything the physical stance wouldn’t. It might even mislead me, or dilute the usefulness of intentional stance language. This conflict with intuition is a lot of what’s driving my reaction this this argument.
The other half of the issue is that I’m used to thinking of intentional-stance features as having cognitive functions. For example, if I “believe” something, this means that I have some actual physical pattern inside me that performs the function of a world-model, and something like plans, actions, or observations that I check against that world-model. The physical system that constantly twitches can indeed be modeled by an agent with a utility function over world-histories, but that agent is in some sense an incorporeal soul—the physical system itself doesn’t have the cognitive functions associated with intentional-stance attributes (like “caring about coherence”).
Yeah, I agree that the concepts of “goals”, “values”, “wanting”, etc. are useful concepts to have, and point to something real. For those concepts, it is true that the constant-twitching agent does not “want” to constantly twitch, nor does it have it as a “goal”. On the other hand, you can say that humans “want” to not suffer.
I’m not arguing that we should drop these concepts altogether. Separately from this post, I want to make the claim that we can try to build an AI system that does not have “goals”. A common counterargument is that due to coherence theorems any sufficiently advanced AI system will have “goals”. I’m rebutting that counterargument with this post.
(The next couple of posts in the sequence should address this a bit more, I think the sequence is going to resume very soon.)