For if it didn’t, then two systems with the same properties (safety, competitiveness) would have different goal-directedness, breaking the pattern of prediction.
This seems like a bad argument to me, because goal-directedness is not meant to be a complete determinant of safety and competitiveness; other things matter too. As an analogy, one property of my internal cognition is that sometimes I am angry. We like to know whether people are angry because (amongst other things) it helps us predict whether they are safe to be around—but there’s nothing inconsistent about two people with the same level of anger being differently safe (e.g. because one of them is also tired and decides to go sleep instead of starting a fight).
If we tried to *define* anger in terms of behaviour, then I predict we’d have a very difficult time, and end up not being able to properly capture a bunch of important aspects of it (like: being angry often makes you fantasise about punching people; or: you can pretend to be angry without actually being angry), because it’s a concept that’s most naturally formulated in terms of internal state and cognition. The same is true for goal-directedness—in fact you agree that the main way we get evidence about goal-directedness in practice is by looking at, and making inferences about, internal cognition. If we think of a concept in cognitive terms, and learn about it in cognitive terms, then I suspect that trying to define it in behavioural terms will only lead to more confusion, and similar mistakes to those that the behaviourists made.
On the more general question of how tractable and necessary a formalism is—leaving aside AI, I’d be curious if you’re optimistic about the prospect of formalising goal-directedness in humans. I think it’s pretty hopeless, and don’t see much reason that this would be significantly easier for neural networks. Fortunately, though, humans already have very sophisticated cognitive machinery for reasoning in non-mathematical ways about other agents.
Let’s say that we want to predict whether or not you will punch me. We know about this famed internal state called anger, which makes you punching me more probable. As you point out, it is not a one-to-one correspondence: you might be angry at me and not punch me (as most people do), or not be angry and punch me (because we’re training in the boxing gym). Still, it gives me some information about whether and when you might punch me.
I’m saying that if I want to find out if you’re angry specifically for predicting whether you’ll punch me, then I should define this anger using only your behavior. Because if you act exactly the same whether you’re “really” angry at me or simply pretending to be angry, this is not a distinction that matters for the prediction I’m trying to do.
Notice that when I’m saying you’re acting exactly the same, I assume I have full knowledge about your behavior in all situations at all time. This is completely impossible, and requires a ridiculous (possibly infinite) amount of computation, storage and other resources. It’s in this context, and only in this context, that I argue that your behavior tells me all that I care about, in order to predict whether you will punch me or not.
Now, I completely agree with you that using internal cognition and internal structure is the only practical way to extract information about goal-directedness and anger. What I argue for is that, insofar as the predictions you want to make are about behaviors, what the internal cognition and structure give you is a way to extrapolate the behavior, perhaps more efficiently. Not a useful definition of goal-directedness.
There are probably things you care about that are not completely about behavior: the well-being of someone, for example. But I don’t think goal-directedness is one of those. For me, its value lies in how it influences safety and competitiveness, which are observable behaviors.
This seems like a bad argument to me, because goal-directedness is not meant to be a complete determinant of safety and competitiveness; other things matter too.
After reading it again, and thinking about your anger example, I agree that this argument probably fails. What still stands for me is that given two systems with the same behavior, I want to given them the same goal-directedness. Whether or not I can compute exactly that they have the same goal-directedness is another question.
Lastly, I probably agree with you and Rohin that a complete formalization of goal-directedness (let alone in human beings) seems impossible. That being said, I do think that there might be some meaningful decomposition into subcomponents (as proposed by Rohin), and that some if not most of these components will yield to formalization. The best I’m hoping for is probably a partial order, where some subcomponents have uncomparable values, but others are either binary or on a scale (like focus).
“What I argue for is that, insofar as the predictions you want to make are about behaviors, what the internal cognition and structure give you is a way to extrapolate the behavior, perhaps more efficiently. Not a useful definition of goal-directedness.”
Perhaps it’d be useful to taboo the word “definition” here. We have this phenomenon, goal-directedness. Partly we think about it in cognitive terms; partly we think about it in behavioural terms. It sounds like you’re arguing that the former is less legitimate. But clearly we’re still going to think about it in both ways—they’re just different levels of abstraction for some pattern in the world. Or maybe you’re saying that it’ll be easier to decompose it when we think about it on a behavioural level? But the opposite seems true to me—we’re much better at reasoning about intentional systems than we are at abstractly categorising behaviour.
“What still stands for me is that given two systems with the same behavior, I want to give them the same goal-directedness.”
I don’t see how you can actually construct two generally intelligent systems which have this property, without them doing basically the same cognition. In theory, of course, you could do so using an infinite lookup table. But I claim that thinking about finite systems based on arguments about the infinite limit is often very misleading, for reasons I outline in this post. Here’s a (somewhat strained) analogy: suppose that I’m trying to build a rocket, and I have this concept “length”, which I’m using to make sure that the components are the right length. Now you approach me, and say “You’re assuming that this rocket engine is longer than this door handle. But if they’re both going at the speed of light, then they both become the same length! So in order to build a rocket, you need a concept of length which is robust to measuring things at light speed.”
To be more precise, my argument is: knowing that two AGIs have exactly the same behaviour but cognition which we evaluate as differently goal-directed is an epistemic situation that is so far removed from what we might ever experience that it shouldn’t inform our everyday concepts.
Perhaps it’d be useful to taboo the word “definition” here. We have this phenomenon, goal-directedness. Partly we think about it in cognitive terms; partly we think about it in behavioural terms. It sounds like you’re arguing that the former is less legitimate. But clearly we’re still going to think about it in both ways—they’re just different levels of abstraction for some pattern in the world. Or maybe you’re saying that it’ll be easier to decompose it when we think about it on a behavioural level? But the opposite seems true to me—we’re much better at reasoning about intentional systems than we are at abstractly categorising behaviour.
Rereading this comment and the ones before, I think we mean different things by “internal structure” or “cognitive terms”. What I mean is what’s inside the system (source code, physical brain states). What I think you mean is ascribing internal cognitive states to the system (in classic intentional stance fashion). Do you agree, or am I misunderstanding again?
So I agree completely that we will need to ascribe intentional beliefs to the system. What I was pointing at is that searching a definition (sorry, used the taboo word) of goal-directedness in terms of the internal structure (that is, the source code for example), is misguided.
By “internal structure” or “cognitive terms” I also mean what’s inside the system, but usually at a higher level of abstraction than physical implementation. For instance, we can describe AlphaGo’s cognition as follows: it searches through a range of possible games, and selects moves that do well in a lot of those games. If we just take the value network by itself (which is still very good at Go) without MCTS, then it’s inaccurate to describe that network as searching over many possible games; it’s playing Go well using only a subset of the type of cognition the full system does.
This differs from the intentional stance by paying more attention to what’s going on inside the system, as opposed to just making inferences from behaviour. It’d be difficult to tell that the full AlphaGo system and the value network alone are doing different types of cognition, just from observing their behaviour—yet knowing that they do different types of cognition is very useful for making predictions about their behaviour on unobserved board positions.
What I was pointing at is that searching a definition (sorry, used the taboo word) of goal-directedness in terms of the internal structure (that is, the source code for example), is misguided.
You can probably guess what I’m going to say here: I still don’t know what you mean by “definition”, or why we want to search for it.
After talking with Evan, I think I understand your point better. What I didn’t understand was that you seemed to argue that there was something else than the behavior that mattered for goal-directedness. But as I understand it now, what you’re saying is that, yes, the behavior is what matters, but extracting the relevant information from the behavior is really hard. And thus you believe that computing goal-directedness in any meaningful way will require normative assumptions about the cognition of the system, at an abstract level.
If that’s right, then I would still disagree with you, but I think the case for my position is far less settled than I assumed. I believe there are lots of interesting parts of goal-directedness that can be extracted from the behavior only, while acknowledging that historically, it has been harder to compute most complex properties of a system from behavior alone.
If that’s not right, then I propose that we schedule a call sometime, to clarify the disagreement with more bandwidth. Actually, even if it’s right, I can call to update you on the research.
This seems like a bad argument to me, because goal-directedness is not meant to be a complete determinant of safety and competitiveness; other things matter too. As an analogy, one property of my internal cognition is that sometimes I am angry. We like to know whether people are angry because (amongst other things) it helps us predict whether they are safe to be around—but there’s nothing inconsistent about two people with the same level of anger being differently safe (e.g. because one of them is also tired and decides to go sleep instead of starting a fight).
If we tried to *define* anger in terms of behaviour, then I predict we’d have a very difficult time, and end up not being able to properly capture a bunch of important aspects of it (like: being angry often makes you fantasise about punching people; or: you can pretend to be angry without actually being angry), because it’s a concept that’s most naturally formulated in terms of internal state and cognition. The same is true for goal-directedness—in fact you agree that the main way we get evidence about goal-directedness in practice is by looking at, and making inferences about, internal cognition. If we think of a concept in cognitive terms, and learn about it in cognitive terms, then I suspect that trying to define it in behavioural terms will only lead to more confusion, and similar mistakes to those that the behaviourists made.
On the more general question of how tractable and necessary a formalism is—leaving aside AI, I’d be curious if you’re optimistic about the prospect of formalising goal-directedness in humans. I think it’s pretty hopeless, and don’t see much reason that this would be significantly easier for neural networks. Fortunately, though, humans already have very sophisticated cognitive machinery for reasoning in non-mathematical ways about other agents.
Thanks for your comment!
Let’s say that we want to predict whether or not you will punch me. We know about this famed internal state called anger, which makes you punching me more probable. As you point out, it is not a one-to-one correspondence: you might be angry at me and not punch me (as most people do), or not be angry and punch me (because we’re training in the boxing gym). Still, it gives me some information about whether and when you might punch me.
I’m saying that if I want to find out if you’re angry specifically for predicting whether you’ll punch me, then I should define this anger using only your behavior. Because if you act exactly the same whether you’re “really” angry at me or simply pretending to be angry, this is not a distinction that matters for the prediction I’m trying to do.
Notice that when I’m saying you’re acting exactly the same, I assume I have full knowledge about your behavior in all situations at all time. This is completely impossible, and requires a ridiculous (possibly infinite) amount of computation, storage and other resources. It’s in this context, and only in this context, that I argue that your behavior tells me all that I care about, in order to predict whether you will punch me or not.
Now, I completely agree with you that using internal cognition and internal structure is the only practical way to extract information about goal-directedness and anger. What I argue for is that, insofar as the predictions you want to make are about behaviors, what the internal cognition and structure give you is a way to extrapolate the behavior, perhaps more efficiently. Not a useful definition of goal-directedness.
There are probably things you care about that are not completely about behavior: the well-being of someone, for example. But I don’t think goal-directedness is one of those. For me, its value lies in how it influences safety and competitiveness, which are observable behaviors.
After reading it again, and thinking about your anger example, I agree that this argument probably fails. What still stands for me is that given two systems with the same behavior, I want to given them the same goal-directedness. Whether or not I can compute exactly that they have the same goal-directedness is another question.
Lastly, I probably agree with you and Rohin that a complete formalization of goal-directedness (let alone in human beings) seems impossible. That being said, I do think that there might be some meaningful decomposition into subcomponents (as proposed by Rohin), and that some if not most of these components will yield to formalization. The best I’m hoping for is probably a partial order, where some subcomponents have uncomparable values, but others are either binary or on a scale (like focus).
“What I argue for is that, insofar as the predictions you want to make are about behaviors, what the internal cognition and structure give you is a way to extrapolate the behavior, perhaps more efficiently. Not a useful definition of goal-directedness.”
Perhaps it’d be useful to taboo the word “definition” here. We have this phenomenon, goal-directedness. Partly we think about it in cognitive terms; partly we think about it in behavioural terms. It sounds like you’re arguing that the former is less legitimate. But clearly we’re still going to think about it in both ways—they’re just different levels of abstraction for some pattern in the world. Or maybe you’re saying that it’ll be easier to decompose it when we think about it on a behavioural level? But the opposite seems true to me—we’re much better at reasoning about intentional systems than we are at abstractly categorising behaviour.
“What still stands for me is that given two systems with the same behavior, I want to give them the same goal-directedness.”
I don’t see how you can actually construct two generally intelligent systems which have this property, without them doing basically the same cognition. In theory, of course, you could do so using an infinite lookup table. But I claim that thinking about finite systems based on arguments about the infinite limit is often very misleading, for reasons I outline in this post. Here’s a (somewhat strained) analogy: suppose that I’m trying to build a rocket, and I have this concept “length”, which I’m using to make sure that the components are the right length. Now you approach me, and say “You’re assuming that this rocket engine is longer than this door handle. But if they’re both going at the speed of light, then they both become the same length! So in order to build a rocket, you need a concept of length which is robust to measuring things at light speed.”
To be more precise, my argument is: knowing that two AGIs have exactly the same behaviour but cognition which we evaluate as differently goal-directed is an epistemic situation that is so far removed from what we might ever experience that it shouldn’t inform our everyday concepts.
Rereading this comment and the ones before, I think we mean different things by “internal structure” or “cognitive terms”. What I mean is what’s inside the system (source code, physical brain states). What I think you mean is ascribing internal cognitive states to the system (in classic intentional stance fashion). Do you agree, or am I misunderstanding again?
So I agree completely that we will need to ascribe intentional beliefs to the system. What I was pointing at is that searching a definition (sorry, used the taboo word) of goal-directedness in terms of the internal structure (that is, the source code for example), is misguided.
By “internal structure” or “cognitive terms” I also mean what’s inside the system, but usually at a higher level of abstraction than physical implementation. For instance, we can describe AlphaGo’s cognition as follows: it searches through a range of possible games, and selects moves that do well in a lot of those games. If we just take the value network by itself (which is still very good at Go) without MCTS, then it’s inaccurate to describe that network as searching over many possible games; it’s playing Go well using only a subset of the type of cognition the full system does.
This differs from the intentional stance by paying more attention to what’s going on inside the system, as opposed to just making inferences from behaviour. It’d be difficult to tell that the full AlphaGo system and the value network alone are doing different types of cognition, just from observing their behaviour—yet knowing that they do different types of cognition is very useful for making predictions about their behaviour on unobserved board positions.
You can probably guess what I’m going to say here: I still don’t know what you mean by “definition”, or why we want to search for it.
After talking with Evan, I think I understand your point better. What I didn’t understand was that you seemed to argue that there was something else than the behavior that mattered for goal-directedness. But as I understand it now, what you’re saying is that, yes, the behavior is what matters, but extracting the relevant information from the behavior is really hard. And thus you believe that computing goal-directedness in any meaningful way will require normative assumptions about the cognition of the system, at an abstract level.
If that’s right, then I would still disagree with you, but I think the case for my position is far less settled than I assumed. I believe there are lots of interesting parts of goal-directedness that can be extracted from the behavior only, while acknowledging that historically, it has been harder to compute most complex properties of a system from behavior alone.
If that’s not right, then I propose that we schedule a call sometime, to clarify the disagreement with more bandwidth. Actually, even if it’s right, I can call to update you on the research.