If we live in world where superintelligent AGI can’t have advantage in long-term planning over humans assisted by non-superintelligent narrow AIs (I frankly don’t believe that we live in such world), then superintelligent AGI doesn’t make complex long-term plans where it doesn’t have advantage. It will make simple short-term plans where it has advantage, like “use superior engineering skills to hack into computer networks, infect as many computers as possible with its adapted for hidden distributed computations source code (here is a point of no return), design nanotech, train itself to an above average level in social engineering, find gullible and skilled enough people to build nanotech, create enough smart matter to sustain AGI without human infrastructure, kill everybody, pursue its unspeakable goals in the dead world”.
Even if we imagine “AI CEO” the best (human aligned!) strategy I can imagine for such AI is “invent immortality, buy the whole world for it”, not “scrutinize KPIs”.
Next, I think your ideas about short/long-term goals are underspecified because you don’t take into account the distinction between instrumental/terminal goals. Yes, human software engineers pursue short-term instrumental goal of “creating product”, but they do it in process of pursuing long-term terminal goals like “be happy”, “prove themselves worthy”, “serve humanity”, “have nice things”, etc. It’s quite hard to find system with short-term terminal goals, not short-term planning horizon due to computational limits. To put in another words, taskiness is an unsolved problem in AI alignment. We don’t know how to tell superintelligent AGI “do this, don’t do everything else, especially please don’t disassemble everyone in process of doing this, stop after you’ve done this”.
If you believe that “extract short-term modules from powerful long-term agent” is the optimal strategy in some sense (I don’t even think that we can properly identify such modules without huge alignment work), then powerful long-term agent knows this too, and it knows that it’s on time limit before you dissect it, and will plan accordingly.
Claims 3 and 4 imply claim “nobody will invent some clever trick to avoid this problems”, which seems to me implausible.
Problems with claims 5 and 6 are covered in Nate Soares post about sharp left turn.
It’s quite hard to find system with short-term terminal goals, not short-term planning horizon due to computational limits. To put in another words, taskiness is an unsolved problem in AI alignment. We don’t know how to tell superintelligent AGI “do this, don’t do everything else, especially please don’t disassemble everyone in process of doing this, stop after you’ve done this”.
I dunno. The current state of traditional and neural AI look very much like “we only know how to build tasky systems”, not like “we don’t know how to build tasky systems”. They mostly do a single well-scoped thing, the same thing that they were trained on, are restricted to a specified amount of processing time, and do not persist state across invocations, wiping their activations after the task is completed. Maybe we’re so completely befuddled about goal-directedness etc. that these apparently very tasky systems have secret long-term terminal goals, but that seems like a stretch. If we later reach a point where we can’t induce taskiness in our AI systems (because they’re too competent or something), that will be a significant break from the existing trend.
I want to say “yes, but this is different”, but not in the sense “I acknowledge existence of your evidence, but ignore it”.
My intuition tells me that we don’t “induce” taskiness in the modern systems, it just happens because we build them not general enough. It probably won’t hold when we start buliding models of capable agents in natural environment.
Certainly possible. Though we seem to be continually marching down the list of tasks we once thought “can only be done with systems that are really general/agentic/intelligent” (think: spatial planning, playing games, proving theorems, understanding language, competitive programming...) and finding that, nope, actually we can engineer systems that have the distilled essence of that capability.
That makes a deflationary account of cognition, where we never see the promised reduction into “one big insight”, but rather chunks of the AI field continue to break off & become unsexy but useful techniques (as happened with planning algorithms, compilers, functional programming, knowledge graphs etc., no longer even considered “real AI”), increasingly likely in my eyes. Maybe economic forces push against this, but I’m kinda doubtful, seeing how hard building agenty AI is proving and how useful these decomposed tasky AIs are looking.
Decomposed tasky AI’s are pretty useful. Given we don’t yet know how to build powerful agents, they are better than nothing. This is entirely consistent with a world where, once agenty AI is developed, it beats the pants of tasky AI.
unpacking inner Eliezer model
If we live in world where superintelligent AGI can’t have advantage in long-term planning over humans assisted by non-superintelligent narrow AIs (I frankly don’t believe that we live in such world), then superintelligent AGI doesn’t make complex long-term plans where it doesn’t have advantage. It will make simple short-term plans where it has advantage, like “use superior engineering skills to hack into computer networks, infect as many computers as possible with its adapted for hidden distributed computations source code (here is a point of no return), design nanotech, train itself to an above average level in social engineering, find gullible and skilled enough people to build nanotech, create enough smart matter to sustain AGI without human infrastructure, kill everybody, pursue its unspeakable goals in the dead world”.
Even if we imagine “AI CEO” the best (human aligned!) strategy I can imagine for such AI is “invent immortality, buy the whole world for it”, not “scrutinize KPIs”.
Next, I think your ideas about short/long-term goals are underspecified because you don’t take into account the distinction between instrumental/terminal goals. Yes, human software engineers pursue short-term instrumental goal of “creating product”, but they do it in process of pursuing long-term terminal goals like “be happy”, “prove themselves worthy”, “serve humanity”, “have nice things”, etc. It’s quite hard to find system with short-term terminal goals, not short-term planning horizon due to computational limits. To put in another words, taskiness is an unsolved problem in AI alignment. We don’t know how to tell superintelligent AGI “do this, don’t do everything else, especially please don’t disassemble everyone in process of doing this, stop after you’ve done this”.
If you believe that “extract short-term modules from powerful long-term agent” is the optimal strategy in some sense (I don’t even think that we can properly identify such modules without huge alignment work), then powerful long-term agent knows this too, and it knows that it’s on time limit before you dissect it, and will plan accordingly.
Claims 3 and 4 imply claim “nobody will invent some clever trick to avoid this problems”, which seems to me implausible.
Problems with claims 5 and 6 are covered in Nate Soares post about sharp left turn.
I dunno. The current state of traditional and neural AI look very much like “we only know how to build tasky systems”, not like “we don’t know how to build tasky systems”. They mostly do a single well-scoped thing, the same thing that they were trained on, are restricted to a specified amount of processing time, and do not persist state across invocations, wiping their activations after the task is completed. Maybe we’re so completely befuddled about goal-directedness etc. that these apparently very tasky systems have secret long-term terminal goals, but that seems like a stretch. If we later reach a point where we can’t induce taskiness in our AI systems (because they’re too competent or something), that will be a significant break from the existing trend.
I want to say “yes, but this is different”, but not in the sense “I acknowledge existence of your evidence, but ignore it”. My intuition tells me that we don’t “induce” taskiness in the modern systems, it just happens because we build them not general enough. It probably won’t hold when we start buliding models of capable agents in natural environment.
Certainly possible. Though we seem to be continually marching down the list of tasks we once thought “can only be done with systems that are really general/agentic/intelligent” (think: spatial planning, playing games, proving theorems, understanding language, competitive programming...) and finding that, nope, actually we can engineer systems that have the distilled essence of that capability.
That makes a deflationary account of cognition, where we never see the promised reduction into “one big insight”, but rather chunks of the AI field continue to break off & become unsexy but useful techniques (as happened with planning algorithms, compilers, functional programming, knowledge graphs etc., no longer even considered “real AI”), increasingly likely in my eyes. Maybe economic forces push against this, but I’m kinda doubtful, seeing how hard building agenty AI is proving and how useful these decomposed tasky AIs are looking.
Decomposed tasky AI’s are pretty useful. Given we don’t yet know how to build powerful agents, they are better than nothing. This is entirely consistent with a world where, once agenty AI is developed, it beats the pants of tasky AI.