But once you let it do more computation, then it doesn’t have to know anything at all, right? Like, maybe the best go bot is, “Train an AlphaZero-like algorithm for a million years, and then use it to play.”
I know more about go than that bot starts out knowing, but less than it will know after it does computation.
I wonder if, when you use the word “know”, you mean some kind of distilled, compressed, easily explained knowledge?
Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.
I think at this point you’ve pushed the word “know” to a point where it’s not very well-defined; I’d encourage you to try to restate the original post while tabooing that word.
This seems particularly valuable because there are some versions of “know” for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete’s ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it’d be good to explain why it’s actually a realistic target.
Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:
it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.
I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.
it’s not obvious to me that this is a realistic target
Perhaps I should instead have said: it’d be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you’re basically asking for people to revive GOFAI.
(I’m being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you’re coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
But once you let it do more computation, then it doesn’t have to know anything at all, right? Like, maybe the best go bot is, “Train an AlphaZero-like algorithm for a million years, and then use it to play.”
I would say that bot knows what the trained AlphaZero-like model knows.
As an additional reason for the importance of tabooing “know”, note that I disagree with all three of your claims about what the model “knows” in this comment and its parent.
(The definition of “know” I’m using is something like “knowing X means possessing a mental model which corresponds fairly well to reality, from which X can be fairly easily extracted”.)
The trained AlphaZero model knows lots of things about Go, in a comparable way to how a dog knows lots of things about running.
But the algorithm that gives rise to that model can know arbitrarily few things. (After all, the laws of physics gave rise to us, but they know nothing at all.)
Ah, understood. I think this is basically covered by talking about what the go bot knows at various points in time, a la this comment—it seems pretty sensible to me to talk about knowledge as a property of the actual computation rather than the algorithm as a whole. But from your response there it seems that you think that this sense isn’t really well-defined.
I’m not sure what you mean by “actual computation rather than the algorithm as a whole”. I thought that I was talking about the knowledge of the trained model which actually does the “computation” of which move to play, and you were talking about the knowledge of the algorithm as a whole (i.e. the trained model plus the optimising bot).
The human knows the rules and the win condition. The optimisation algorithm doesn’t, for the same reason that evolution doesn’t “know” what dying is: neither are the types of entities to which you should ascribe knowledge.
Suppose you have a computer program that gets two neural networks, simulates a game of go between them, determines the winner, and uses the outcome to modify the neural networks. It seems to me that this program has a model of the ‘go world’, i.e. a simulator, and from that model you can fairly easily extract the rules and winning condition. Do you think that this is a model but not a mental model, or that it’s too exact to count as a model, or something else?
I’d say that this is too simple and programmatic to be usefully described as a mental model. The amount of structure encoded in the computer program you describe is very small, compared with the amount of structure encoded in the neural networks themselves. (I agree that you can have arbitrarily simple models of very simple phenomena, but those aren’t the types of models I’m interested in here. I care about models which have some level of flexibility and generality, otherwise you can come up with dumb counterexamples like rocks “knowing” the laws of physics.)
As another analogy: would you say that the quicksort algorithm “knows” how to sort lists? I wouldn’t, because you can instead just say that the quicksort algorithm sorts lists, which conveys more information (because it avoids anthropomorphic implications). Similarly, the program you describe builds networks that are good at Go, and does so by making use of the rules of Go, but can’t do the sort of additional processing with respect to those rules which would make me want to talk about its knowledge of Go.
But once you let it do more computation, then it doesn’t have to know anything at all, right? Like, maybe the best go bot is, “Train an AlphaZero-like algorithm for a million years, and then use it to play.”
I know more about go than that bot starts out knowing, but less than it will know after it does computation.
I wonder if, when you use the word “know”, you mean some kind of distilled, compressed, easily explained knowledge?
Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.
I think at this point you’ve pushed the word “know” to a point where it’s not very well-defined; I’d encourage you to try to restate the original post while tabooing that word.
This seems particularly valuable because there are some versions of “know” for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete’s ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it’d be good to explain why it’s actually a realistic target.
Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:
it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.
I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.
Perhaps I should instead have said: it’d be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you’re basically asking for people to revive GOFAI.
(I’m being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you’re coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
Actually, hmm. My thoughts are not really in equilibrium here.
(Also: such a rewrite would be a combination of ‘what I really meant’ and ‘what the comments made me realize I should have really meant’)
I would say that bot knows what the trained AlphaZero-like model knows.
Also it certainly knows the rules of go and the win condition.
As an additional reason for the importance of tabooing “know”, note that I disagree with all three of your claims about what the model “knows” in this comment and its parent.
(The definition of “know” I’m using is something like “knowing X means possessing a mental model which corresponds fairly well to reality, from which X can be fairly easily extracted”.)
In the parent, is your objection that the trained AlphaZero-like model plausibly knows nothing at all?
The trained AlphaZero model knows lots of things about Go, in a comparable way to how a dog knows lots of things about running.
But the algorithm that gives rise to that model can know arbitrarily few things. (After all, the laws of physics gave rise to us, but they know nothing at all.)
Ah, understood. I think this is basically covered by talking about what the go bot knows at various points in time, a la this comment—it seems pretty sensible to me to talk about knowledge as a property of the actual computation rather than the algorithm as a whole. But from your response there it seems that you think that this sense isn’t really well-defined.
I’m not sure what you mean by “actual computation rather than the algorithm as a whole”. I thought that I was talking about the knowledge of the trained model which actually does the “computation” of which move to play, and you were talking about the knowledge of the algorithm as a whole (i.e. the trained model plus the optimising bot).
On that definition, how does one train an AlphaZero-like algorithm without knowing the rules of the game and win condition?
The human knows the rules and the win condition. The optimisation algorithm doesn’t, for the same reason that evolution doesn’t “know” what dying is: neither are the types of entities to which you should ascribe knowledge.
Suppose you have a computer program that gets two neural networks, simulates a game of go between them, determines the winner, and uses the outcome to modify the neural networks. It seems to me that this program has a model of the ‘go world’, i.e. a simulator, and from that model you can fairly easily extract the rules and winning condition. Do you think that this is a model but not a mental model, or that it’s too exact to count as a model, or something else?
I’d say that this is too simple and programmatic to be usefully described as a mental model. The amount of structure encoded in the computer program you describe is very small, compared with the amount of structure encoded in the neural networks themselves. (I agree that you can have arbitrarily simple models of very simple phenomena, but those aren’t the types of models I’m interested in here. I care about models which have some level of flexibility and generality, otherwise you can come up with dumb counterexamples like rocks “knowing” the laws of physics.)
As another analogy: would you say that the quicksort algorithm “knows” how to sort lists? I wouldn’t, because you can instead just say that the quicksort algorithm sorts lists, which conveys more information (because it avoids anthropomorphic implications). Similarly, the program you describe builds networks that are good at Go, and does so by making use of the rules of Go, but can’t do the sort of additional processing with respect to those rules which would make me want to talk about its knowledge of Go.