I really appreciate you taking the time both to write this report and solicit/respond to all these reviews! I think this is a hugely valuable resource, that has helped me to better understand AI risk arguments and the range of views/cruxes that different people have.
A couple quick notes related to the review I contributed:
First, .4% is the credence implied by my credences in individual hypotheses — but I was a little surprised by how small this number turned out to be. (I would have predicted closer to a couple percent at the time.) I’m sympathetic to the possibility that the high level of conjuctiveness here created some amount of downward bias, even if the argument does actually have a highly conjunctive structure.
Second (only of interest to anyone who looked at my review): My sense is we still haven’t succeeded in understanding each other’s views about the nature and risk-relevance of planning capabilities. For example, I wouldn’t necessarily agree with this claim in your response to the section on planning:
Presumably, after all, a fixed-weight feedforward network could do whatever humans do when we plan trips to far away places, think about the best way to cut down different trees, design different parts of a particle collider, etc—and this is the type of cognition I want to focus on.
Let’s compare a deployed version of AlphaGo with and without Monte Carlo tree search. It seems like the version with Monte Carlo tree search could be said to engage in planning: roughly speaking, it simulates the implications of different plays, and these simulations are used to arrive at better decisions. It doesn’t seem to me like there’s any sense in which the version of AlphaGo without MCTS is doing this. [1] Insofar as Go-playing humans simulate the implications of different plays, and use the simulations to arrive at better decisions, I don’t think a plain fixed-weight feedforward Go-playing network could be said to be doing the same sort of cognition as people. It could still play as well as humans, if it had been trained well enough, but it seems to me that the underlying cognition would nonetheless be different.
I feel like I have a rough sense of the distinction between these two versions of AlphaGo and a rough sense of how this distinction might matter for safety. But if both versions engage in “planning,” by some thinner conception of “planning,” then I don’t think I have a good understanding of what this version of the “planning”/“non-planning” distinction is pointing at — or why it matters.
It might be interesting to try to more fully unpack our views at some point, since I do think that differences in how people think about planning might be an underappreciated source of disagreement about AI risk (esp. around ‘inner alignment’).
- ↩︎
One way of pressing this point: There’s not really a sense in which you could give it more ‘time to think,’ in a given turn, and have its ultimate decision keep getting better and better.
Hm, I’d probably disagree.
A couple thoughts here:
First: To me, it seems one important characteristic of “planners” is that they can improve their decisions/behavior even without doing additional learning. For example, if I’m playing chess, there might be some move that (based on my previous learning) initially presents itself as the obvious one to make. But I can sit there and keep running mental simulations of different games I haven’t yet played (“What would happen if I moved that piece there…?”) and arrive at better and better decisions.
It doesn’t seem like that’d be true of a deployed version of AlphaGo without MCTS. If you present it with some board state, it seems like it will just take whatever action (or distribution of actions) is already baked into its policy. There’s not a sense, I think, in which it will keep improving its decision. Unlike in the MCTS case, you can’t tweak some simple parameter and give it more ‘time to think’ and allow it to make a better decision. So that’s one sense in which AlphaGo without MCTS doesn’t seem, to me, like it could exhibit planning.
However, second: A version of AlphaGo without explicit MCTS might still qualify as a “planner” on a thinner conception of “planning.” In this case, I suppose the hypothesis would be that: when we do a single forward pass through the network, we carry out some computations that are roughly equivalent to the computations involved in (e.g.) MCTS. I suppose that can’t be ruled out, although I’m also not entirely sure how to think about it. One thing we could still say, though, is that insofar as planning processes tend to involve a lot of sequential steps, the number of layers in MCTS-less AlphaGo would seriously limit the amount of ‘planning’ it can do. Eight layers don’t seem like nearly enough for a forward pass to correspond to any meaningful amount of planning.
So my overall view is: For a somewhat strict conception of “planning,” it doesn’t seem like feedforward networks can plan. For a somewhat loose conception of “planning,” it actually is conceivable that a feedforward network could plan — but (I think) only if it had a really huge number of layers. I’m also not sure that there would a tendency for the system to start engaging in this kind of “planning” as layer count increases; I haven’t thought enough to have a strong take.[1]
Also, to clarify: I think that the question of whether feedforward networks can plan probably isn’t very practically relevant, in-and-of-itself — since they’re going to be less important than other kinds of networks. I’m interested in this question mainly as a way of pulling apart different conceptions of “planning,” noticing ambiguities and disagreements, etc.