Jaan mentions that prediction_function seems to be a too-convenient rug to sweep details under. They discuss some wrappers around a tool and what those might do. In particular here are a couple of Jaan’s questions with Holden’s responses (paraphrased):
Jaan: Would you agree that it is rather trivial to extend [Tool AI] to be [General AI] by adding a main loop (the “master function”) that simply uses the predictor to maximize an explicit utility function?
Holden: Probably though not definitely. If the “master function” is just sending packets of data, as you proposed, it won’t necessarily have the ability to accomplish as much as a well-funded, able-bodied human would. I’m aware of the arguments along the lines of “humans figured out how to kill elephants … this thing would figure out how to overpower us” and I think they’re probably, though not definitely, correct.
Jaan: Would you agree that there’s the obvious way to make [Tool AI] more powerful by asking it for ways to improve its own source code?
Holden: Maybe. It seems easy for [Tool AI] to be intelligent enough to create all-powerful weapons, cure every disease, etc. while still not intelligent enough to make improvements to its own predictive algorithm that it knows are improvements for general intelligence, i.e. predictive intelligence in every domain. I think it’s likely that we will ultimately arrive at prediction_function() by imitating the human one and implementing it in superior hardware. The human one has been developed by trial-and-error over millions of years in the real world, a method that won’t be available to the [Tool AI]. So there’s no guarantee that a greater intelligence could find a way to improve this algorithm without such extended trial-and-error. It depends how much greater its intelligence is.
Jaan: Furthermore, such improvement would seem by far the lowest hanging fruit to the developer who implemented the [Tool AI] in the first place (assuming he’s not concerned with safety)?
Holden: Probably not. If I were a possessor of a [Tool AI], I’d want to develop superweapons, medical advances, etc. ASAP. So first I’d see whether it could do those without modifying itself. And I think a [Tool AI] capable of writing a superior general intelligence algorithm is probably capable of those other things as well.
As we’ve seen from decision theory posts on LW, prediction_function has some very tricky questions around whether it’s thinking about the counterfactual “if do(humans implement $action) then $outcome” or “if do(this computation outputs $action) then $outcome” or others, and other odd questions like whether it considers inferences about the state of the world given that humans eventually take $action, etc. etc. I feel like moving to a tool doesn’t get rid of these problems for prediction_function.
At least in the original post, I don’t think Holden’s point is that tool-AI is much easier than agent-AI (though he seems to have intuition that it is), but that it’s potentially much safer (largely from increased feedback), and thus that it deserves more investigation (and that it’s a bad sign of SIAI that it’s neglected this approach).
Yes, good point. The objection is about SI not addressing tool-AI, much of their discussion is about addressing tool-AI, not the meta “why isn’t this explicitly called out by SI?” In particular the intuitions Holden has as responses to those questions, that we may well be able to create extremely useful general AI without creating general AI that can improve itself, do seem like they have received too little in-depth discussion here. We’ve often mentioned the possibility and often decided to skip the question because it’s very hard to think about but I don’t recall many really lucid conversations trying to ferret out what it would look like if more-than-narrow, less-than-having-the-ability-to-self-improve AI were a large enough target to reliably hit.
(as an aside, I think as Holden thinks of it, tool-AI could self improve, but because it’s tool like and not agent-like, it would not automatically self improve. Its outputs could be of the form “I would decide to rewrite my program with code X”, but humans would need to actually implement these changes.)
In my opinion Karnofsky/Tallinn 2011 is required reading for this objection. Here is Holden’s pseudocode for a tool:
Jaan mentions that prediction_function seems to be a too-convenient rug to sweep details under. They discuss some wrappers around a tool and what those might do. In particular here are a couple of Jaan’s questions with Holden’s responses (paraphrased):
As we’ve seen from decision theory posts on LW, prediction_function has some very tricky questions around whether it’s thinking about the counterfactual “if do(humans implement $action) then $outcome” or “if do(this computation outputs $action) then $outcome” or others, and other odd questions like whether it considers inferences about the state of the world given that humans eventually take $action, etc. etc. I feel like moving to a tool doesn’t get rid of these problems for prediction_function.
At least in the original post, I don’t think Holden’s point is that tool-AI is much easier than agent-AI (though he seems to have intuition that it is), but that it’s potentially much safer (largely from increased feedback), and thus that it deserves more investigation (and that it’s a bad sign of SIAI that it’s neglected this approach).
Yes, good point. The objection is about SI not addressing tool-AI, much of their discussion is about addressing tool-AI, not the meta “why isn’t this explicitly called out by SI?” In particular the intuitions Holden has as responses to those questions, that we may well be able to create extremely useful general AI without creating general AI that can improve itself, do seem like they have received too little in-depth discussion here. We’ve often mentioned the possibility and often decided to skip the question because it’s very hard to think about but I don’t recall many really lucid conversations trying to ferret out what it would look like if more-than-narrow, less-than-having-the-ability-to-self-improve AI were a large enough target to reliably hit.
(as an aside, I think as Holden thinks of it, tool-AI could self improve, but because it’s tool like and not agent-like, it would not automatically self improve. Its outputs could be of the form “I would decide to rewrite my program with code X”, but humans would need to actually implement these changes.)
The LW-esque brush-off version, but there is some truth to it.
utility_function = construct_utility_function(“peace on earth”);
...
report($leading_action);
$ ‘press Enter for achieving peace’
2 year later… Crickets and sunshine. Only clrickets.