So, for the AutoGPT-style AI here contemplated, it appears to me like this agent-like behaviour would not emerge out of the AI’s increased capabilities and achievement of general intelligence to reason, devise accurate models of the world and of humans, and plan; nor would it emerge out of a set of values specified. It would instead come from the capabilities to plan that would be specified.
Take a behaviorist or functionalist approach to AI. Let’s say we only understood AutoGPT’s epistemic beliefs about the world-state. How well could we predict its behavior? Now, let’s say we treated it as having goals—perhaps just knowing how it was initially prompted with a goal by the user. Would that help us predict its behavior better? I think it would.
Whatever one thinks it means to be “really” agentic or “really” intelligent, AutoGPT is acting as if it was agentic and intelligent. And in some cases, it is already outperforming humans. I think AutoGPT’s demo bot (it’s a chef coming up with a themed recipe for a holiday) outperforms almost all humans in the speed and quality with which it comes up with a solution, and of course it can repeat that performance as many times as you care to run it.
What this puzzle reveals to some extent is that there may not be a fundamental difference between “agency” and “capabilities.” If an agent fails to protect what we infer to be its terminal goals, allowing them to be altered, it is hard to be sure if that’s because we misunderstood what its terminal goal was, or whether it was simply incompetent or failed by chance. Until last week, humanity had never before had the chance to run repeatable experiments on an identical intelligent agent. This is the birth of a nascent field of “agent engineering,” a field devoted to building more capable agents, diagnosing the reasons for their failures, and better controlling our ability to predict outputs from inputs. As an example of a small experiment we can do right now with AutoGPT, can we make a list of 10 goal-specs for AutoGPT on which it achieves 80% of them within an hour?
Treating AutoGPT and its descendents as agents is going to be fruitful, although the fruit may be poisoned bananas.
(I reply both to you and @Ericf here). I do struggle a bit to make up my mind on whether drawing a line of agency is really important. We could say that a calculator has the ‘goal’ of returning the right result to the user; we don’t treat a calculator as an agent, but is it because of its very nature and the way in which it was programmed, or is it for a matter of capabilities, it being incapable of making plans and considering a number of different paths to achieve its goals?
My guess is that there is something that makes up an agent and which has to do with the ability to strategise in order to complete a task; i.e. it has to explore different alternatives and choose the ones that would best satisfy its goals. Or at least a way to modify its strategy. Am I right here? And, to what extent is a sort of counterfactual thinking needed to be able to ascribe to it this agency property; or is following some pre-programmed algorithms to update its strategy enough? I am not sure about the answer, and about how much it matters.
There are some other questions I am unclear about:
Would having a pre-programmed algorithm/map on how to generate, prioritise and execute tasks (like for AutoGPT) limit its capacity for finding ways to achieve its goals? Would it make it impossible for it to find some solutions that a similarly powerful AI could have reached?
Is there a point at which it is unnecessary for this planning algorithm to be specified, since the AI would have acquired the capacity to plan and execute tasks on its own?
If you didn’t know what a calculator was, and were told that it had the goal of always returning the right answer to whatever equation was input, and failing that, never returning a wrong answer, that would help you predict its behavior.
The calculator can even return error messages for badly formatted inputs, and comes with an interface that helps humans avoid slip ups.
So I would say that the calculator is behaving with very constrained but nonzero functional agency. Its capabilities are limited by the programmers to exactly those required to achieve its goal under normal operating conditions (it can’t anticipate and avoid getting tossed in the ocean or being reprogrammed).
Likewise, a bacterial genome exhibits a form of functional agency, with the goal of staying alive and reproducing itself. Knowing this helps us predict what specific behaviors bacteria might exhibit.
Describing something as possessing some amount of functional agency is not the same as saying it is conscious, highly capable of achieving this goal, or that the mechanism causing goal-oriented behavior has any resemblance to a biological brain. We can predict water’s behavior well knowing only that it “wants to flow downhill,” even if we know nothing of gravity.
The reason for doubting agency in non-brain-endowed entities is that we want to make space for two things. First, we want to emphasize that behaving with functional agency is not the same as having moral weight attached to those goals. Water has no right to flow downhill, and we don’t have any duty to allow it to do so.
Second, we want to emphasize that the mechanism producing functionally agentic behavior is critical to understand, as it informs both our conception of the goal and the agent’s ability to achieve it, both of which are critical for predicting how it will behave. There is a limit to how much you can predict with a rough understanding like “water’s goal is to flow downhill,” just as there’s a limit to how well you can predict ChaosGPT’s behavior by saying “ChaosGPT wants to take over the world.” Functional agency arises from non-agenetic underlying mechanisms, such as gravity and intermolecular forces in the case of water, linear algebra in the case of GPT, or action potentials in the case of human beings.
So for AutoGPT, we need to be able to make confident predictions about what it will or won’t do, and to validate those predictions. If modeling the software as an agent with goals helps us do that, then great, think of it as an agent. If modeling it as a glorified calculator is more useful, do that too.
I think a full understanding of the connection between the appearance of agency and lower level mechanisms would ultimately be a purely mechanical description of the universe in which humans too were nothing more than calculators. So the trick mentally may be to stop associating agency with the feeing of consciousness/what it’s like to feel and experience, and to start thinking of agency as a mechanical phenomenon. It’s just that in the case of AI, particularly LLMs, we are seeing a new mechanism for producing goal-oriented behavior, one that permits new goals as well, and one which we have not yet rendered down to mechanical description. It shares that in common with the human mind.
Thank you for your explanations. My confusion was not so much from associating agency with consciousness and morality or other human attributes, but with whether it was judged from an inside, mechanistic point of view, or from an outside, predicting point of view of the system. From the outside, it can be useful to say that “water has the goal to flow downhill”, or that “electrons have the goal to repel electrons and attract protons”, inasmuch as “goal” is referred to as “tendency”. From an inside view, as you said, it’s nothing like the agency we know; they are fully deterministic laws or rules. Our own agency is in part an illusion, because we too act deterministically; the laws of physics, but more specifically, the patterns or laws of our own human behaviour. These seem much more complex and harder for us to understand than the laws of gravity or electromagnetism, but reasons do exist for every single one of our actions and decisions, of course.
A key property of agents is that the more agentic a being is, the more you can predict its actions from its goals since its actions will be whatever will maximize the chances of achieving its goals. Agency has sometimes been contrasted with sphexishness, the blind execution of cached algorithms without regard for effectiveness.
Although, at the same time, agency and sphexishness might not be truly opposed; one refers to an outside perspective, the other to an inside perspective. We are all sphexish in a sense, but we attribute to others and even to the I this agency property because we are ignorant of many of our own rules.
We can also reframe AI agents as sophisticated feedback controllers.
A normal controller, like a thermostat, is set up to sense a variable (temperature) and take a control action in response (activate a boiler). If the boiler is broken, or the thermostat is inaccurate, the controller fails.
An AI agent is able to flexibly examine the upstream causes of its particular goal and exert control over those in adaptive fashion. It can figure out likely causes of failing to accurately regulate temperature, such as a broken boiler, and figure out how to monitor and control those as well.
I think one way we could try to define agency in a way that excludes calculators would be the ability to behave as if pursuing an arbitrary goal. Calculators are stuck with the goal you built them with, providing a correct answer to an input equation. AutoGPT lets you define its goal in natural language.
So if we put these together—a static piece of software that can monitor and manipulate the causes leading to some outcome in an open-ended way, and allows an input to flexibly specify what outcome ought to be controlled, I think we have a pretty intuitive definition of agency.
AutoGPT allows the user to specify truly arbitrary goals. AutoGPT doesn’t even need to be perfect at pursuing a destructive goal to be dangerous—it can also be dangerous by pursuing a good goal in a dangerous way. To me, the only thing making AutoGPT anything but a nightmare tool is that for now, it is ineffective at pursuing most goals. But I look at that thing operating and I don’t at all see anything intrinsically sphexish, nor an intelligence that is likely to naturally become more moral as it becomes more capable. I see an amoral agent that hasn’t been quite constructed to efficiently accomplish the aims it was given yet.
I think the key facility of am agent vs a calculator is the capability to create new short term goals and actions.
A calculator (or water, or bacteria) can only execute the “programming” that was present when it was created. An agent can generate possible actions based on its environment, including options that might not even have existed when it was created.
Take a behaviorist or functionalist approach to AI. Let’s say we only understood AutoGPT’s epistemic beliefs about the world-state. How well could we predict its behavior? Now, let’s say we treated it as having goals—perhaps just knowing how it was initially prompted with a goal by the user. Would that help us predict its behavior better? I think it would.
Whatever one thinks it means to be “really” agentic or “really” intelligent, AutoGPT is acting as if it was agentic and intelligent. And in some cases, it is already outperforming humans. I think AutoGPT’s demo bot (it’s a chef coming up with a themed recipe for a holiday) outperforms almost all humans in the speed and quality with which it comes up with a solution, and of course it can repeat that performance as many times as you care to run it.
What this puzzle reveals to some extent is that there may not be a fundamental difference between “agency” and “capabilities.” If an agent fails to protect what we infer to be its terminal goals, allowing them to be altered, it is hard to be sure if that’s because we misunderstood what its terminal goal was, or whether it was simply incompetent or failed by chance. Until last week, humanity had never before had the chance to run repeatable experiments on an identical intelligent agent. This is the birth of a nascent field of “agent engineering,” a field devoted to building more capable agents, diagnosing the reasons for their failures, and better controlling our ability to predict outputs from inputs. As an example of a small experiment we can do right now with AutoGPT, can we make a list of 10 goal-specs for AutoGPT on which it achieves 80% of them within an hour?
Treating AutoGPT and its descendents as agents is going to be fruitful, although the fruit may be poisoned bananas.
(I reply both to you and @Ericf here). I do struggle a bit to make up my mind on whether drawing a line of agency is really important. We could say that a calculator has the ‘goal’ of returning the right result to the user; we don’t treat a calculator as an agent, but is it because of its very nature and the way in which it was programmed, or is it for a matter of capabilities, it being incapable of making plans and considering a number of different paths to achieve its goals?
My guess is that there is something that makes up an agent and which has to do with the ability to strategise in order to complete a task; i.e. it has to explore different alternatives and choose the ones that would best satisfy its goals. Or at least a way to modify its strategy. Am I right here? And, to what extent is a sort of counterfactual thinking needed to be able to ascribe to it this agency property; or is following some pre-programmed algorithms to update its strategy enough? I am not sure about the answer, and about how much it matters.
There are some other questions I am unclear about:
Would having a pre-programmed algorithm/map on how to generate, prioritise and execute tasks (like for AutoGPT) limit its capacity for finding ways to achieve its goals? Would it make it impossible for it to find some solutions that a similarly powerful AI could have reached?
Is there a point at which it is unnecessary for this planning algorithm to be specified, since the AI would have acquired the capacity to plan and execute tasks on its own?
If you didn’t know what a calculator was, and were told that it had the goal of always returning the right answer to whatever equation was input, and failing that, never returning a wrong answer, that would help you predict its behavior.
The calculator can even return error messages for badly formatted inputs, and comes with an interface that helps humans avoid slip ups.
So I would say that the calculator is behaving with very constrained but nonzero functional agency. Its capabilities are limited by the programmers to exactly those required to achieve its goal under normal operating conditions (it can’t anticipate and avoid getting tossed in the ocean or being reprogrammed).
Likewise, a bacterial genome exhibits a form of functional agency, with the goal of staying alive and reproducing itself. Knowing this helps us predict what specific behaviors bacteria might exhibit.
Describing something as possessing some amount of functional agency is not the same as saying it is conscious, highly capable of achieving this goal, or that the mechanism causing goal-oriented behavior has any resemblance to a biological brain. We can predict water’s behavior well knowing only that it “wants to flow downhill,” even if we know nothing of gravity.
The reason for doubting agency in non-brain-endowed entities is that we want to make space for two things. First, we want to emphasize that behaving with functional agency is not the same as having moral weight attached to those goals. Water has no right to flow downhill, and we don’t have any duty to allow it to do so.
Second, we want to emphasize that the mechanism producing functionally agentic behavior is critical to understand, as it informs both our conception of the goal and the agent’s ability to achieve it, both of which are critical for predicting how it will behave. There is a limit to how much you can predict with a rough understanding like “water’s goal is to flow downhill,” just as there’s a limit to how well you can predict ChaosGPT’s behavior by saying “ChaosGPT wants to take over the world.” Functional agency arises from non-agenetic underlying mechanisms, such as gravity and intermolecular forces in the case of water, linear algebra in the case of GPT, or action potentials in the case of human beings.
So for AutoGPT, we need to be able to make confident predictions about what it will or won’t do, and to validate those predictions. If modeling the software as an agent with goals helps us do that, then great, think of it as an agent. If modeling it as a glorified calculator is more useful, do that too.
I think a full understanding of the connection between the appearance of agency and lower level mechanisms would ultimately be a purely mechanical description of the universe in which humans too were nothing more than calculators. So the trick mentally may be to stop associating agency with the feeing of consciousness/what it’s like to feel and experience, and to start thinking of agency as a mechanical phenomenon. It’s just that in the case of AI, particularly LLMs, we are seeing a new mechanism for producing goal-oriented behavior, one that permits new goals as well, and one which we have not yet rendered down to mechanical description. It shares that in common with the human mind.
Thank you for your explanations. My confusion was not so much from associating agency with consciousness and morality or other human attributes, but with whether it was judged from an inside, mechanistic point of view, or from an outside, predicting point of view of the system. From the outside, it can be useful to say that “water has the goal to flow downhill”, or that “electrons have the goal to repel electrons and attract protons”, inasmuch as “goal” is referred to as “tendency”. From an inside view, as you said, it’s nothing like the agency we know; they are fully deterministic laws or rules. Our own agency is in part an illusion, because we too act deterministically; the laws of physics, but more specifically, the patterns or laws of our own human behaviour. These seem much more complex and harder for us to understand than the laws of gravity or electromagnetism, but reasons do exist for every single one of our actions and decisions, of course.
I find LW’s definiton of agency useful:
Although, at the same time, agency and sphexishness might not be truly opposed; one refers to an outside perspective, the other to an inside perspective. We are all sphexish in a sense, but we attribute to others and even to the I this agency property because we are ignorant of many of our own rules.
We can also reframe AI agents as sophisticated feedback controllers.
A normal controller, like a thermostat, is set up to sense a variable (temperature) and take a control action in response (activate a boiler). If the boiler is broken, or the thermostat is inaccurate, the controller fails.
An AI agent is able to flexibly examine the upstream causes of its particular goal and exert control over those in adaptive fashion. It can figure out likely causes of failing to accurately regulate temperature, such as a broken boiler, and figure out how to monitor and control those as well.
I think one way we could try to define agency in a way that excludes calculators would be the ability to behave as if pursuing an arbitrary goal. Calculators are stuck with the goal you built them with, providing a correct answer to an input equation. AutoGPT lets you define its goal in natural language.
So if we put these together—a static piece of software that can monitor and manipulate the causes leading to some outcome in an open-ended way, and allows an input to flexibly specify what outcome ought to be controlled, I think we have a pretty intuitive definition of agency.
AutoGPT allows the user to specify truly arbitrary goals. AutoGPT doesn’t even need to be perfect at pursuing a destructive goal to be dangerous—it can also be dangerous by pursuing a good goal in a dangerous way. To me, the only thing making AutoGPT anything but a nightmare tool is that for now, it is ineffective at pursuing most goals. But I look at that thing operating and I don’t at all see anything intrinsically sphexish, nor an intelligence that is likely to naturally become more moral as it becomes more capable. I see an amoral agent that hasn’t been quite constructed to efficiently accomplish the aims it was given yet.
I think the key facility of am agent vs a calculator is the capability to create new short term goals and actions. A calculator (or water, or bacteria) can only execute the “programming” that was present when it was created. An agent can generate possible actions based on its environment, including options that might not even have existed when it was created.