Suppose [...] you’ve got this AI system with this really, really good intelligence, which maybe we’ll call it a world model or just general intelligence. And this intelligence can take in any utility function, and optimize it, and you plug in the incorrect utility function, and catastrophe happens.
I’ve seen various people make the argument that this is not how AI works and it’s not how AGI will work—it’s basically the old “tool AI” vs “agent AI” debate. But I think the only reason current AI doesn’t do this is because we can’t make it do this yet: the default customer requirement for a general intelligence is that it should be able to do whatever task the user asks it to do.
So far the ability of AI to understand a request is very limited (poor natural language skills). But once you have an agent that can understand what you’re asking, of course you would design it to optimize new objectives on request, bounded of course by some built-in rules about not committing crimes or manipulating people or seizing control of the world (easy, I assume). Otherwise, you’d need to build a new system for every type of goal, and that’s basically just narrow AI.
If our superintelligent AI is just a bunch of well developed heuristics, it is unlikely that those heuristics will be generatively strategic enough to engage in super-long-term planning
If the heuristics are optimized for “be able to satisfy requests from humans” and those requests sometimes require long-term planning, then the skill will develop. If it’s only good at satisfying simple requests that don’t require planning, in what sense is it superintelligent?
I am not arguing that we’ll end up building tool AI; I do think it will be agent-like. At a high level, I’m arguing that the intelligence and agentiness will increase continuously over time, and as we notice the resulting (non-existential) problems we’ll fix them, or start over.
I agree with your point that long-term planning will develop even with a bunch of heuristics.
If the heuristics are optimized for “be able to satisfy requests from humans” and those requests sometimes require long-term planning, then the skill will develop. If it’s only good at satisfying simple requests that don’t require planning, in what sense is it superintelligent?
Yeah, that statement is wrong. I was trying to make a more subtle point about how an AI that learns long-term planning on a shorter time-frame is not necessarily going to be able to generalize to longer time-frames (but in the context of superintelligent AIs capable of doing human leve tasks, I do think it will generalize—so that point is kind of irrelevant). I agree with Rohin’s response.
I’ve seen various people make the argument that this is not how AI works and it’s not how AGI will work—it’s basically the old “tool AI” vs “agent AI” debate. But I think the only reason current AI doesn’t do this is because we can’t make it do this yet: the default customer requirement for a general intelligence is that it should be able to do whatever task the user asks it to do.
So far the ability of AI to understand a request is very limited (poor natural language skills). But once you have an agent that can understand what you’re asking, of course you would design it to optimize new objectives on request, bounded of course by some built-in rules about not committing crimes or manipulating people or seizing control of the world (easy, I assume). Otherwise, you’d need to build a new system for every type of goal, and that’s basically just narrow AI.
If the heuristics are optimized for “be able to satisfy requests from humans” and those requests sometimes require long-term planning, then the skill will develop. If it’s only good at satisfying simple requests that don’t require planning, in what sense is it superintelligent?
I am not arguing that we’ll end up building tool AI; I do think it will be agent-like. At a high level, I’m arguing that the intelligence and agentiness will increase continuously over time, and as we notice the resulting (non-existential) problems we’ll fix them, or start over.
I agree with your point that long-term planning will develop even with a bunch of heuristics.
Yeah, that statement is wrong. I was trying to make a more subtle point about how an AI that learns long-term planning on a shorter time-frame is not necessarily going to be able to generalize to longer time-frames (but in the context of superintelligent AIs capable of doing human leve tasks, I do think it will generalize—so that point is kind of irrelevant). I agree with Rohin’s response.