I think I don’t understand what you mean here. I’ll say some things that may or may not be relevant:
I don’t think the ability to plan implies goal-directedness. Tabooing goal-directedness, I don’t think an AI that can “intrinsically” plan will necessarily pursue convergent instrumental subgoals. For example, the AI could have “intrinsic” planning capabilities, that find plans that when executed by a human lead to outcomes the human wants. Depending on how it finds such plans, such an AI may not pursue any of the convergent instrumental subgoals. (Google Maps would be an example of such an AI system, and by my understanding Google Maps has “intrinsic” planning capabilities.)
I also don’t think that we will find the one true algorithm for planning (I agree with most of Richard’s positions in Realism about rationality).
I don’t think that my intuitions depend on an AI’s ability to emulate humans (e.g. Google Maps does not emulate humans).
Google Maps is not a relevant example. I am talking about “generally intelligent” agents. Meaning that, these agents construct sophisticated models of the world starting from a relatively uninformed prior (comparably to humans or more so)(fn1)(fn2). This is in sharp contrast to Google Maps that operates strictly within the model it was given a priori. General intelligence is important, since without it I doubt it will be feasible to create a reliable defense system. Given general intelligence, convergent instrumental goals follow: any sufficiently sophisticated model of the world implies that achieving converging instrumental goals is instrumentally valuable.
I don’t think it makes that much difference whether a human executes the plan or the AI itself. If the AI produces a plan that is not human comprehensible and the human follows it blindly, the human effectively becomes just an extension of the AI. On the other hand, if the AI produces a plan which is human comprehensible, then after reviewing the plan the human can just as well delegate its execution to the AI.
I am not sure what is the significance in this context of “one true algorithm for planning”? My guess is, there is a relatively simple qualitatively optimal AGI algorithm(fn3), and then there are various increasingly complex quantitative improvements of it, which take into account specifics of computing hardware and maybe our priors about humans and/or the environment. Which is the way algorithms for most natural problems behave, I think. But also improvements probably stop mattering beyond the point where the AGI can come with them on its own within a reasonable time frame. And, I dispute Richard’s position. But then again, I don’t understand the relevance.
(fn1) When I say “construct models” I am mostly talking about the properties of the agent rather than the structure of the algorithm. That is, the agent can effectively adapt to a large class of different environments or exploit a large class of different properties the environment can have. In this sense, model-free RL is also constructing models. Although I’m also leaning towards the position that explicitly model-based approaches are more like to scale to AGI.
(fn2) Even if you wanted to make a superhuman AI that only solves mathematical problems, I suspect that the only way it could work is by having the AI generate models of “mathematical behaviors”.
(fn3) As an analogy, a “qualitatively optimal” algorithm for a problem in P is just any polynomial time algorithm. In the case of AGI, I imagine a similar computational complexity bound plus some (also qualitative) guarantee(s) about sample complexity and/or query complexity. By “relatively simple” I mean something like, can be described within 20 pages given that we can use algorithms for other natural problems.
I think I don’t understand what you mean here. I’ll say some things that may or may not be relevant:
I don’t think the ability to plan implies goal-directedness. Tabooing goal-directedness, I don’t think an AI that can “intrinsically” plan will necessarily pursue convergent instrumental subgoals. For example, the AI could have “intrinsic” planning capabilities, that find plans that when executed by a human lead to outcomes the human wants. Depending on how it finds such plans, such an AI may not pursue any of the convergent instrumental subgoals. (Google Maps would be an example of such an AI system, and by my understanding Google Maps has “intrinsic” planning capabilities.)
I also don’t think that we will find the one true algorithm for planning (I agree with most of Richard’s positions in Realism about rationality).
I don’t think that my intuitions depend on an AI’s ability to emulate humans (e.g. Google Maps does not emulate humans).
Google Maps is not a relevant example. I am talking about “generally intelligent” agents. Meaning that, these agents construct sophisticated models of the world starting from a relatively uninformed prior (comparably to humans or more so)(fn1)(fn2). This is in sharp contrast to Google Maps that operates strictly within the model it was given a priori. General intelligence is important, since without it I doubt it will be feasible to create a reliable defense system. Given general intelligence, convergent instrumental goals follow: any sufficiently sophisticated model of the world implies that achieving converging instrumental goals is instrumentally valuable.
I don’t think it makes that much difference whether a human executes the plan or the AI itself. If the AI produces a plan that is not human comprehensible and the human follows it blindly, the human effectively becomes just an extension of the AI. On the other hand, if the AI produces a plan which is human comprehensible, then after reviewing the plan the human can just as well delegate its execution to the AI.
I am not sure what is the significance in this context of “one true algorithm for planning”? My guess is, there is a relatively simple qualitatively optimal AGI algorithm(fn3), and then there are various increasingly complex quantitative improvements of it, which take into account specifics of computing hardware and maybe our priors about humans and/or the environment. Which is the way algorithms for most natural problems behave, I think. But also improvements probably stop mattering beyond the point where the AGI can come with them on its own within a reasonable time frame. And, I dispute Richard’s position. But then again, I don’t understand the relevance.
(fn1) When I say “construct models” I am mostly talking about the properties of the agent rather than the structure of the algorithm. That is, the agent can effectively adapt to a large class of different environments or exploit a large class of different properties the environment can have. In this sense, model-free RL is also constructing models. Although I’m also leaning towards the position that explicitly model-based approaches are more like to scale to AGI.
(fn2) Even if you wanted to make a superhuman AI that only solves mathematical problems, I suspect that the only way it could work is by having the AI generate models of “mathematical behaviors”.
(fn3) As an analogy, a “qualitatively optimal” algorithm for a problem in P is just any polynomial time algorithm. In the case of AGI, I imagine a similar computational complexity bound plus some (also qualitative) guarantee(s) about sample complexity and/or query complexity. By “relatively simple” I mean something like, can be described within 20 pages given that we can use algorithms for other natural problems.