A language model is in some sense trying to generate the “optimal” prediction for how a text is going to continue. Yet, it is not really trying: it is just a fixed algorithm. If it wanted to find optimal predictions, it would try to take over computational resources and improve its algorithm.
Is there an existing word/language for describing the difference between these two types of optimisation?
In general, why can’t we just build AGIs that does the first type of optimisations and not the second?
There’s discussion on why Tool AIs are expected to become agents; one of the biggest arguments is that agents are likely to be more effective than tools. If you have a tool, you can ask it what you should do in order to get what you want; if you have an agent, you can just ask it to get you the things that you want. Compare Google Maps vs. self-driving cars: Google Maps is great, but if you get the car to be an agent, you get all kinds of other benefits.
It would be great if everyone did stick to just building tool AIs. But if everyone knows that they could get an advantage over their competitors by building an agent, it’s unlikely that everyone would just voluntarily restrain themselves due to caution.
Also it’s not clear that there’s any sharp dividing line between AGI and non-AGI AI; if you’ve been building agentic AIs all along (like people are doing right now) and they slowly get smarter and smarter, how do you know when’s the point when you should stop building agents and should switch to only building tools? Especially when you know that your competitors might not be as cautious as you are, so if you stop then they might go further and their smarter agent AIs will outcompete yours, meaning the world is no safer and you’ve lost to them? (And at the same time, they are applying the same logic for why they should not stop, since they don’t know that you can be trusted to stop.)
Would you say a self-driving car is a tool AI or agentic AI? I can see how the self-driving car is a bit more agentic, but as long as it only drives when you tell it to, I would consider it a tool. But I can also see that the border is a bit blurry.
If self-driving cars are not considered agentic, do you have examples of people attempting to make agent AIs?
As you say, it’s more of a continuum than a binary. A self-driving car is more agenty than Google Maps, and a self-driving car that was making independent choices of where to drive would be more agentic still.
People are generally trying to make all kinds of more agentic AIs, because more agentic AIs are so much more useful.
Stock-trading bots that automatically buy and sell stock are more agenty than software that just tells human traders what to buy, and preferred because a bot without a human in the loop can outcompete a slower system that does have the slow human making decisions.
An AI autonomously optimizing data center cooling is more agenty than one that just tells human operators where to make adjustments and is preferred… that article doesn’t actually make it explicit why they switched to an autonomously operating system, but “because it can make lots of small tweaks humans wouldn’t bother with and is therefore more effective” seems to be implied?
The military has expressed an interest in making their drones more autonomous (agenty) rather than being remotely operated. This is for several reasons, including the fact that remote-operated drones can be jammed, and because having a human in the loop slows down response time if fighting against an enemy drone.
All kinds of personal assistant software that anticipates your needs and actively tries to help you is more agenty than software that just passively waits for you to use it. E.g. once when I was visiting a friend my phone popped up a notification about the last bus home departing soon. Some people want their phones to be more agentic like this because it’s convenient if you have someone actively anticipating your needs and ensuring that they get taken care of for you.
The first type of AI is a regular narrow AI, the type we’ve been building for a while. The second type is an agentic AI, a strong AI, which we have yet to build. The problem is, AIs are trained using gradient descent, which basically involves running AI designs from all possible AI designs. Gradient descent will train the AI that can maximize the reward best. As a result of this, agentic AIs become more likely because they are better at complex tasks. While we can modify the reward scheme, as tasks get more and more complex, agentic AIs are pretty much the way to go, so we can’t avoid building an agentic AI, and have no real idea if we’ve even created one until it displays behaviour that indicates it.
+1 for the word agentic AI. I think that is what I was looking for.
However, I don’t believe that gradient descent alone can turn an AI agentic. No matter how long you train a language model, it is not going to suddenly want to acquire resources to get better at predicting human language (unless you specifically ask it questions about how to do that, and then implement the suggestions. Even then you are likely to only do what humans would have suggested, although maybe you can make it do research similar to and faster than humans would have done it).
Here’s a non-obvious way it could fail. I don’t expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.
Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and maybe even cause the user to run an obfuscated version of the program from the linked post. The AI doesn’t need to be an agent for any of this to happen (though it would be clearly much more likely if it were one).
I don’t think that any of those failure modes (including the model developing some sort of internal agent to better predict text) are very likely to happen in a controlled environment. However, as others have mentioned, agent AIs are simply more powerful, so we’re going to build them too.
In short, the difference between the two is Generality. A system that understands the concepts of computational resources and algorithms might do exactly that to improve it’s text prediction. Taking the G out of AGI could work, until the tasks get complex enough they require it.
A language model (LM) is a great example, because it is missing several features that AI would have to have in order to be dangerous. (1) It is trained to perform a narrow task (predict the next word in a sequence), for which it has zero “agency”, or decision-making authority. A human would have to connect a language model to some other piece of software (i.e. a web-hosted chatbot) to make it dangerous. (2) It cannot control its own inputs (e.g. browsing the web for more data), or outputs (e.g. writing e-mails with generated text). (3) It has no long-term memory, and thus cannot plan or strategize in any way. (4) It runs a fixed-function data pipeline, and has no way to alter its programming, or even expand its computational use, in any way.
I feel fairly confident that, no matter how powerful, current LMs cannot “go rogue” because of these limitations. However, there is also no technical obstacle for an AI research lab to remove these limitations, and many incentives for them to do so. Chatbots are an obvious money-making application of LMs. Allowing an LM to look up data on its own to self-improve (or even just answer user questions in a chatbot) is an obvious way to make a better LM. Researchers are currently equipping LMs with long-term memory (I am a co-author on this work). AutoML is a whole sub-field of AI research, which equips models with the ability to change and grow over time.
The word you’re looking for is “intelligent agent”, and the answer to your question “why don’t we just not build these things?” is essentially the same as “why don’t we stop research into AI?” How do you propose to stop the research?
A language model is in some sense trying to generate the “optimal” prediction for how a text is going to continue. Yet, it is not really trying: it is just a fixed algorithm. If it wanted to find optimal predictions, it would try to take over computational resources and improve its algorithm.
Is there an existing word/language for describing the difference between these two types of optimisation? In general, why can’t we just build AGIs that does the first type of optimisations and not the second?
Agent AI vs. Tool AI.
There’s discussion on why Tool AIs are expected to become agents; one of the biggest arguments is that agents are likely to be more effective than tools. If you have a tool, you can ask it what you should do in order to get what you want; if you have an agent, you can just ask it to get you the things that you want. Compare Google Maps vs. self-driving cars: Google Maps is great, but if you get the car to be an agent, you get all kinds of other benefits.
It would be great if everyone did stick to just building tool AIs. But if everyone knows that they could get an advantage over their competitors by building an agent, it’s unlikely that everyone would just voluntarily restrain themselves due to caution.
Also it’s not clear that there’s any sharp dividing line between AGI and non-AGI AI; if you’ve been building agentic AIs all along (like people are doing right now) and they slowly get smarter and smarter, how do you know when’s the point when you should stop building agents and should switch to only building tools? Especially when you know that your competitors might not be as cautious as you are, so if you stop then they might go further and their smarter agent AIs will outcompete yours, meaning the world is no safer and you’ve lost to them? (And at the same time, they are applying the same logic for why they should not stop, since they don’t know that you can be trusted to stop.)
Would you say a self-driving car is a tool AI or agentic AI? I can see how the self-driving car is a bit more agentic, but as long as it only drives when you tell it to, I would consider it a tool. But I can also see that the border is a bit blurry.
If self-driving cars are not considered agentic, do you have examples of people attempting to make agent AIs?
As you say, it’s more of a continuum than a binary. A self-driving car is more agenty than Google Maps, and a self-driving car that was making independent choices of where to drive would be more agentic still.
People are generally trying to make all kinds of more agentic AIs, because more agentic AIs are so much more useful.
Stock-trading bots that automatically buy and sell stock are more agenty than software that just tells human traders what to buy, and preferred because a bot without a human in the loop can outcompete a slower system that does have the slow human making decisions.
An AI autonomously optimizing data center cooling is more agenty than one that just tells human operators where to make adjustments and is preferred… that article doesn’t actually make it explicit why they switched to an autonomously operating system, but “because it can make lots of small tweaks humans wouldn’t bother with and is therefore more effective” seems to be implied?
The military has expressed an interest in making their drones more autonomous (agenty) rather than being remotely operated. This is for several reasons, including the fact that remote-operated drones can be jammed, and because having a human in the loop slows down response time if fighting against an enemy drone.
All kinds of personal assistant software that anticipates your needs and actively tries to help you is more agenty than software that just passively waits for you to use it. E.g. once when I was visiting a friend my phone popped up a notification about the last bus home departing soon. Some people want their phones to be more agentic like this because it’s convenient if you have someone actively anticipating your needs and ensuring that they get taken care of for you.
The first type of AI is a regular narrow AI, the type we’ve been building for a while. The second type is an agentic AI, a strong AI, which we have yet to build. The problem is, AIs are trained using gradient descent, which basically involves running AI designs from all possible AI designs. Gradient descent will train the AI that can maximize the reward best. As a result of this, agentic AIs become more likely because they are better at complex tasks. While we can modify the reward scheme, as tasks get more and more complex, agentic AIs are pretty much the way to go, so we can’t avoid building an agentic AI, and have no real idea if we’ve even created one until it displays behaviour that indicates it.
+1 for the word agentic AI. I think that is what I was looking for.
However, I don’t believe that gradient descent alone can turn an AI agentic. No matter how long you train a language model, it is not going to suddenly want to acquire resources to get better at predicting human language (unless you specifically ask it questions about how to do that, and then implement the suggestions. Even then you are likely to only do what humans would have suggested, although maybe you can make it do research similar to and faster than humans would have done it).
Here’s a non-obvious way it could fail. I don’t expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.
Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and maybe even cause the user to run an obfuscated version of the program from the linked post. The AI doesn’t need to be an agent for any of this to happen (though it would be clearly much more likely if it were one).
I don’t think that any of those failure modes (including the model developing some sort of internal agent to better predict text) are very likely to happen in a controlled environment. However, as others have mentioned, agent AIs are simply more powerful, so we’re going to build them too.
In short, the difference between the two is Generality. A system that understands the concepts of computational resources and algorithms might do exactly that to improve it’s text prediction. Taking the G out of AGI could work, until the tasks get complex enough they require it.
A language model (LM) is a great example, because it is missing several features that AI would have to have in order to be dangerous. (1) It is trained to perform a narrow task (predict the next word in a sequence), for which it has zero “agency”, or decision-making authority. A human would have to connect a language model to some other piece of software (i.e. a web-hosted chatbot) to make it dangerous. (2) It cannot control its own inputs (e.g. browsing the web for more data), or outputs (e.g. writing e-mails with generated text). (3) It has no long-term memory, and thus cannot plan or strategize in any way. (4) It runs a fixed-function data pipeline, and has no way to alter its programming, or even expand its computational use, in any way.
I feel fairly confident that, no matter how powerful, current LMs cannot “go rogue” because of these limitations. However, there is also no technical obstacle for an AI research lab to remove these limitations, and many incentives for them to do so. Chatbots are an obvious money-making application of LMs. Allowing an LM to look up data on its own to self-improve (or even just answer user questions in a chatbot) is an obvious way to make a better LM. Researchers are currently equipping LMs with long-term memory (I am a co-author on this work). AutoML is a whole sub-field of AI research, which equips models with the ability to change and grow over time.
The word you’re looking for is “intelligent agent”, and the answer to your question “why don’t we just not build these things?” is essentially the same as “why don’t we stop research into AI?” How do you propose to stop the research?