This barely worked with gpt-3.5, works sometimes with gpt-4, and is supposedly better with Gemini 1.5 1M. It also requires tools, for example code interpreters, web search APIs, and image generators.
Why can this fail in a dangerous way?
Very large scope goals, and a model that can run for a long time working on them. The classic one is “make as many paperclips as possible ”.
Notice this hypothetical AI doesn’t make a short list of paperclip factory equipment. Another session orders. In arrival another agent plans the assembly…
Instead, it is an eternally running process, and it is fully in charge of all steps, and is allowed to do absolutely anything including order arbitrary commands to kill humans or order self improvement.
State buildup. Almost all computer system failures are from accumulation of state. Because fine tuning is off for publicly available models, the most state that can build up is the size of the context buffer. This makes Gemini 1.5 1M more dangerous than prior models, but it’s likely still far too weak to do damage.
High competence. Current models fail often, and especially fail at skilled tasks that would be a problem, such as hacking. With short context buffers they soon forget every mitigation tried so far and are eternally stuck.
Missing modalities. Current models don’t have a memory component, online learning, robotic proprioception tokens as input, realtime video as input, or or robotic tokens as output. (List used to be much longer!)
A series of multiple failures and extremely capable models that lead to the model developing its own self interested goals (bad) or successfully hiding its own goals (more bad) or collaborating secretly with many other models (catastrophic).
A place to exist outside of human control (or secretly hiding below human detection) must exist in the world. Right now you need over 100 GPUs or a lot of TPUs to host a model. Some day a single card in a computer might be able to host a full AGI. Once compute is very dense and cheap and idle you have a “compute overhang”. It might take years before that happens, 20+.
Summary : current models are already agents. But they aren’t currently broken in the above ways.
You shouldn’t worry yet, the models need to be far more capable.
Even then there are key mistakes, such as assigning too much responsibility to a single context, no short duration termination condition, and allowing unnecessary communication between running models before bad outcomes become possible.
A common argument is that doing the above is very convenient and efficient. Hopefully we get bad outcomes early when models are too weak to do any real damage.
You shouldn’t worry yet, the models need to be far more capable.
The right time to start worrying is too early, otherwise it will be too late.
(I agree in the sense that current models very likely can’t be made existentially dangerous, and in that sense “worrying” is incorrect, but the proper use of worrying is planning for the uncertain future, a different sense of “worrying”.)
“agentic” is “give the model a goal and access to tools and it emits outputs intended to accomplish the goal.
Example: https://chat.openai.com/share/0f396757-0b81-4ace-af8c-a2cb37e0985d
This barely worked with gpt-3.5, works sometimes with gpt-4, and is supposedly better with Gemini 1.5 1M. It also requires tools, for example code interpreters, web search APIs, and image generators.
Why can this fail in a dangerous way?
Very large scope goals, and a model that can run for a long time working on them. The classic one is “make as many paperclips as possible ”.
Notice this hypothetical AI doesn’t make a short list of paperclip factory equipment. Another session orders. In arrival another agent plans the assembly… Instead, it is an eternally running process, and it is fully in charge of all steps, and is allowed to do absolutely anything including order arbitrary commands to kill humans or order self improvement.
State buildup. Almost all computer system failures are from accumulation of state. Because fine tuning is off for publicly available models, the most state that can build up is the size of the context buffer. This makes Gemini 1.5 1M more dangerous than prior models, but it’s likely still far too weak to do damage.
High competence. Current models fail often, and especially fail at skilled tasks that would be a problem, such as hacking. With short context buffers they soon forget every mitigation tried so far and are eternally stuck.
Missing modalities. Current models don’t have a memory component, online learning, robotic proprioception tokens as input, realtime video as input, or or robotic tokens as output. (List used to be much longer!)
A series of multiple failures and extremely capable models that lead to the model developing its own self interested goals (bad) or successfully hiding its own goals (more bad) or collaborating secretly with many other models (catastrophic).
A place to exist outside of human control (or secretly hiding below human detection) must exist in the world. Right now you need over 100 GPUs or a lot of TPUs to host a model. Some day a single card in a computer might be able to host a full AGI. Once compute is very dense and cheap and idle you have a “compute overhang”. It might take years before that happens, 20+.
Summary : current models are already agents. But they aren’t currently broken in the above ways.
You shouldn’t worry yet, the models need to be far more capable.
Even then there are key mistakes, such as assigning too much responsibility to a single context, no short duration termination condition, and allowing unnecessary communication between running models before bad outcomes become possible.
A common argument is that doing the above is very convenient and efficient. Hopefully we get bad outcomes early when models are too weak to do any real damage.
The right time to start worrying is too early, otherwise it will be too late.
(I agree in the sense that current models very likely can’t be made existentially dangerous, and in that sense “worrying” is incorrect, but the proper use of worrying is planning for the uncertain future, a different sense of “worrying”.)