I think this is a good critique, but I don’t think he explains his idea very well. I’m not positive I’ve understood it, but I’ll give my best understanding.
A tool AGI is something like a pure a decision theory with no attached utility function. It is not an agent with a utility function for answering questions or something like that. The decision theory is a program which takes a set of inputs including utility values, and returns an action. An ultra simple example might be an argmax function which takes a finite set of utility values for a finite set of action options and returns the option number that has the highest value. It doesn’t actually do the action, it just returns the number; it’s up to you to use the output.
Each time you run it, you provide a new set of inputs, these might include utility values, a utility function, a description of some problem, copy of its own source etc. Each time you run the program, it returns a decision. If you run the program twice with the same inputs it will return the same outputs. It’s up to the humans to use that decision. Some of the actions that the decision theory returns will be things like “rewrite tool AI program with code Y (to have feature X)”, but it’s still up to the humans to execute this task. Other actions will be “gather more information” operations perhaps even on a really low level like “read the next bit in the input stream”.
Now all this sounds hopelessly slow to me and potentially impossible for other reasons too, but I haven’t seen much discussion of ideas like this, and it seems like it deserves investigation.
As I see it (which may not be Holden’s veiw), the main benefit of this approach is that there is dramatically more feedback to the human designers. Agent-AI could easily be a one chance sort of thing, making it really easy to screw up. Increased feedback seems like a really good thing.
I’m not sure how do you distinguish the tool AGI from either a narrow AI or an agent with a utility function for answering questions (=an Oracle)?
If it solves problems of a specific kind (e.g., Input: GoogleMap, point A, point B, “quickest path” utility function; Output: the route) by following well-understood algorithms for solving these kinds of problems, then it’s obviously a narrow AI.
If it solves general problems, then its algorithm must have the goal to find the action that maximizes the input utility, and then it’s an Oracle.
Lets taboo the words narrow AI, AGI, Oracle, as I think they’re getting in the way.
Lets say you’ve found a reflective decision theory and found a pretty good computational approximation. You could go off and try to find the perfect utility function and link the two together and press “start”, this is what we normally imagine doing.
Alternatively, you could code up the decision theory and run it for “one step” with a predefined input and set of things in memory (which might include an approximate utility function or a set of utility values for different options etc.) and see what it outputs as the next action to take. Importantly, the program doesn’t do anything that it thinks of as in its option set (like “rewrite your code so that it’s faster” or “turn on sensor X”, or “press the blue button”), it just returns which option it deems best. You take this output and do what you want with it: maybe use it, discard it or maybe re run the decision theory with modified inputs. One of its outputs might be “replace my program with code X because it’s smarter” (so I don’t think it’s useful to call it narrow AI”), but it doesn’t automatically replace its code as such.
I don’t understand what you mean by “running a decision theory for one step”. Assume you give the system a problem (in the form of a utility function to maximize or a goal to achieve), and ask to find the best next action to take. This makes the system an Agent with the goal of finding the best next action (subject to all additional constraints you may specify, like maximal computation time, etc). If the system is really intelligent, and the problem (of finding the best next action) is hard so the system needs more resources, and there is any hole anywhere in the box, then the system will get out.
Regarding self-modification, I don’t think it is relevant to the safety issues by itself. It is only important in that using it, the system may become very intelligent very fast. The danger is intelligence, not self-modification. Also, a sufficiently intelligent program may be able to create and run a new program without your knowledge or consent, either by simulating an interpreter (slow, but if the new program makes exponential time-saving, this would still make a huge impact), or by finding and exploiting bugs in its own programming.
I think this is a good critique, but I don’t think he explains his idea very well. I’m not positive I’ve understood it, but I’ll give my best understanding.
A tool AGI is something like a pure a decision theory with no attached utility function. It is not an agent with a utility function for answering questions or something like that. The decision theory is a program which takes a set of inputs including utility values, and returns an action. An ultra simple example might be an argmax function which takes a finite set of utility values for a finite set of action options and returns the option number that has the highest value. It doesn’t actually do the action, it just returns the number; it’s up to you to use the output.
Each time you run it, you provide a new set of inputs, these might include utility values, a utility function, a description of some problem, copy of its own source etc. Each time you run the program, it returns a decision. If you run the program twice with the same inputs it will return the same outputs. It’s up to the humans to use that decision. Some of the actions that the decision theory returns will be things like “rewrite tool AI program with code Y (to have feature X)”, but it’s still up to the humans to execute this task. Other actions will be “gather more information” operations perhaps even on a really low level like “read the next bit in the input stream”.
Now all this sounds hopelessly slow to me and potentially impossible for other reasons too, but I haven’t seen much discussion of ideas like this, and it seems like it deserves investigation.
As I see it (which may not be Holden’s veiw), the main benefit of this approach is that there is dramatically more feedback to the human designers. Agent-AI could easily be a one chance sort of thing, making it really easy to screw up. Increased feedback seems like a really good thing.
I’m not sure how do you distinguish the tool AGI from either a narrow AI or an agent with a utility function for answering questions (=an Oracle)?
If it solves problems of a specific kind (e.g., Input: GoogleMap, point A, point B, “quickest path” utility function; Output: the route) by following well-understood algorithms for solving these kinds of problems, then it’s obviously a narrow AI.
If it solves general problems, then its algorithm must have the goal to find the action that maximizes the input utility, and then it’s an Oracle.
Lets taboo the words narrow AI, AGI, Oracle, as I think they’re getting in the way.
Lets say you’ve found a reflective decision theory and found a pretty good computational approximation. You could go off and try to find the perfect utility function and link the two together and press “start”, this is what we normally imagine doing.
Alternatively, you could code up the decision theory and run it for “one step” with a predefined input and set of things in memory (which might include an approximate utility function or a set of utility values for different options etc.) and see what it outputs as the next action to take. Importantly, the program doesn’t do anything that it thinks of as in its option set (like “rewrite your code so that it’s faster” or “turn on sensor X”, or “press the blue button”), it just returns which option it deems best. You take this output and do what you want with it: maybe use it, discard it or maybe re run the decision theory with modified inputs. One of its outputs might be “replace my program with code X because it’s smarter” (so I don’t think it’s useful to call it narrow AI”), but it doesn’t automatically replace its code as such.
I don’t understand what you mean by “running a decision theory for one step”. Assume you give the system a problem (in the form of a utility function to maximize or a goal to achieve), and ask to find the best next action to take. This makes the system an Agent with the goal of finding the best next action (subject to all additional constraints you may specify, like maximal computation time, etc). If the system is really intelligent, and the problem (of finding the best next action) is hard so the system needs more resources, and there is any hole anywhere in the box, then the system will get out.
Regarding self-modification, I don’t think it is relevant to the safety issues by itself. It is only important in that using it, the system may become very intelligent very fast. The danger is intelligence, not self-modification. Also, a sufficiently intelligent program may be able to create and run a new program without your knowledge or consent, either by simulating an interpreter (slow, but if the new program makes exponential time-saving, this would still make a huge impact), or by finding and exploiting bugs in its own programming.