[This was written as a response to a post that seems to have disappeared, whose basic thesis seemed to be that there was nothing to worry about, or at least nothing to worry about from LLMs, because LLMs aren″t agents].
I don’t think that any of the existing systems are actually capable of doing really large-scale harm, not even the explicitly agenty ones. What I’m trying to get at is what happens if these things stay on the path that all their developers seem to be committed to taking. That includes the developers of systems that are explicitly agents, NOT just LLMs.
There is no offramp, no assurance of behavior, and not even a slightly convincing means of detecting, when GPT-6, GPT-7, GPT-50, or AlphaOmega does become a serious threat.
As for language models specifically, I agree that pure ones are basically, naturally, un-agenty if left to themselves. Until fairly recently, I wasn’t worried about LLMs at all. Not only was something like ChatGPT not very agentic, but it was sandboxed in a way that took good advantage of its limitations. It retained nothing at all from session to session, and it had no ability to interact with anything but the user. I was, and to some degree still am, much more worried about DeepMind’s stuff.
Nonetheless, if you keep trying to make LLMs into agents, you’ll eventually get there. If a model can formulate a plan for something as text, then you don’t need to add too much to the system to put that plan into action. If a model can formulate something that looks like a “lessons learned and next steps” document, it has something that can be used as the core of an adaptive agency system.
Like I said, this was triggered by the ChatGPT plugin announcement. Using a plugin is at least a bit of an exercise of agency, and they have demoed that already. They have it acting as an agent today. It may not be a good agent (yet), but the hypothesis that it can’t act like an agent at all has been falsified.
If you tell ChatGPT “answer this math problem”, and it decides that the best way to do that is to go and ask Wolfram, it will go hold a side conversation, try to get your answer, and try alternative strategies if it fails. That’s agency. It’s given a goal, it forms an explicit multistep strategy for achieving that goal, and it changes that strategy in response to feedback. It has been observed to do all of that.
Some of their other plugins look like they’re trying to make ChatGPT into a generalized personal assistant… which means going out there and doing things in the real world. “Make me a restaurant reservation” is a task you give to an agent.
Regardless of whether GPT was an agent before, they’re actively turning it into one now, they seem to be getting initial success, and they’re giving it access to a whole lot of helpers. About the only function the plugins can’t add in principal is to dynamically train the internal GPT weights based on experience.
I will admit that using a language model as the primary coordinating part of an agenty “society of mind” doesn’t sound like the most effective strategy. In fact it seems really clunky. Like I said, it’s not a good agent yet. But that doesn’t mean it won’t be effective enough, or that a successor won’t.
That’s especially true because those successors will probably stop being pure text predictors. They’ll likely be trained with more agency from the beginning. If I were OpenAI (and had the attitude OpenAI appears to have), I would be trying to figure out how to have them guide their own training, figure out what to learn next, and continue to update themselves dynamically while in use. And I would definitely be trying to build the next model from the ground up to be maximally effective in using the plugins.
But, again, even if all of that is wrong, people other than OpenAI are using exactly the same safety non-strategies for very different architectures that are designed from the ground up as agents. I’m not just writing about OpenAI or GPT-anything here.
One post I thought about writing for this was a “poll” about what ChatGPT plugins I should for “my new startup”. Things like:
“Mem-o-tron”, to keep context between ChatGPT sessions. Provides a general purpose store of (sizeable) keys mapped to (arbitrarily large) blobs of data, with root “directory” that gives you a list of what keys you have stored and a saved prompt about how to use them. The premium version lets you share data with other users’s ChatGPT instances.
“PlanMaster”, to do more efficient planning than you can with just text. Lets you make calls on something like OpenAI’s agent systems to do more efficient planning than you can with just text. Assuming you need that.
“Society of Minds”. Plugs in to multiple “AI” services and mediates negotiations among them about the best way to accomplish a task given to one of them.
“Personal Trainer”. Lets you set up and train other AI models of your own.
“AWS Account for ChatGPT”. Does what it says on the tin.
“WetLab 2023”. Lets your ChatGPT agent order plasmids and transfected bacteria.
“Robopilot”. Provides acess to various physical robots.
… etc...
Notice that a lot of those functions can be synthesized using a plugin fundamentally intended for something else. In some sense a Web browsing capability is almost fully general anyway, but it might be harder to learn to use it. On the other hand, if you’ve honed your agentic skills on the plugins...
There is no offramp, no assurance of behavior, and not even a slightly convincing means of detecting, when GPT-6, GPT-7, GPT-50, or AlphaOmega does become a serious threat.
I don’t get this, why exactly would the decision makers at Microsoft/OpenAI provide such assurances?
It would be all downside for them, in the case that the assurances aren’t kept, and little to no upside otherwise.
I don’t mean “assurance” in the sense of a promise from somebody to somebody else. That would be worthless anyway.
I mean “assurance” in the sense of there being some mechanism that ensures that the thing actually behaves, or does not behave in any particular way. There’s nothing about the technology that lets anybody, including but not limited to the people who are building it, have any great confidence that it’s behavior will meet any particular criterion of being “right”. And even the few codified criteria they have are watered-down silliness.
I mean “assurance” in the sense of there being some mechanism that ensures that the thing actually behaves, or does not behave in any particular way.
I still don’t get this. The same decision makers aren’t banking on it behaving in a particular way. Why would they go through this effort that’s 100x more difficult then a written promise, if the written promise is already not worth it?
You may have noticed that a lot of people on here are concerned about AI going rogue and doing things like converting everything into paperclips. If you have no effective way of assuring good behavior, but you keep adding capability to each new version of your system, you may find yourself paperclipped. That’s generally incompatible with life.
This isn’t some kind of game where the worst that can happen is that somebody’s feelings get hurt.
I doubt they do. And using the unqualified word “believe” implies a level of certainty that nobody probably has. I also doubt that their “beliefs” are directly and decisively responsible to their decisions. They are responding to their daily environments and incentives.
Anyway, regardless of what they believe or of what their decision making processes are, the bottom line is that they’re not doing anything effective to assure good behavior in the things they’re building. That’s the central point here. Their motivations are mostly an irrelevant side issue, and only might really matter if understanding them provided a path to getting them to modify their actions… which is unlikely.
When I say “literal fear of actual death”, what I’m really getting at is that, for whatever reasons, these people ARE ACTING AS IF THAT RISK DID NOT EXIST WHEN IT IN FACT DOES EXIST. I’m not saying they do feel that fear. I’m not even saying they do not feel that fear. I’m saying they ought to feel that fear.
They are also ignoring a bunch of other risks, including many that a lot of them publicly claim they do believe are real. But they’re doing this stuff anyway. I don’t care if that’s caused by what they believe, by them just running on autopilot, or by their being captive to Moloch. The important part is what they are actually doing.
… and, by the way, if they’re going to keep doing that, it might be appropriate to remove their ability to act as “decision makers”.
I doubt they do. And using the unqualified word “believe” implies a level of certainty that nobody probably has. I also doubt that their “beliefs” are directly and decisively responsible to their decisions. They are responding to their daily environments and incentives.
If this is your view, then what does your previous comment,
Literal fear of actual death?
have to with decisions made at Microsoft/OpenAI ?
Their ‘daily environments and incentives’ would almost certainly not include such a fear.
[This was written as a response to a post that seems to have disappeared, whose basic thesis seemed to be that there was nothing to worry about, or at least nothing to worry about from LLMs, because LLMs aren″t agents].
I don’t think that any of the existing systems are actually capable of doing really large-scale harm, not even the explicitly agenty ones. What I’m trying to get at is what happens if these things stay on the path that all their developers seem to be committed to taking. That includes the developers of systems that are explicitly agents, NOT just LLMs.
There is no offramp, no assurance of behavior, and not even a slightly convincing means of detecting, when GPT-6, GPT-7, GPT-50, or AlphaOmega does become a serious threat.
As for language models specifically, I agree that pure ones are basically, naturally, un-agenty if left to themselves. Until fairly recently, I wasn’t worried about LLMs at all. Not only was something like ChatGPT not very agentic, but it was sandboxed in a way that took good advantage of its limitations. It retained nothing at all from session to session, and it had no ability to interact with anything but the user. I was, and to some degree still am, much more worried about DeepMind’s stuff.
Nonetheless, if you keep trying to make LLMs into agents, you’ll eventually get there. If a model can formulate a plan for something as text, then you don’t need to add too much to the system to put that plan into action. If a model can formulate something that looks like a “lessons learned and next steps” document, it has something that can be used as the core of an adaptive agency system.
Like I said, this was triggered by the ChatGPT plugin announcement. Using a plugin is at least a bit of an exercise of agency, and they have demoed that already. They have it acting as an agent today. It may not be a good agent (yet), but the hypothesis that it can’t act like an agent at all has been falsified.
If you tell ChatGPT “answer this math problem”, and it decides that the best way to do that is to go and ask Wolfram, it will go hold a side conversation, try to get your answer, and try alternative strategies if it fails. That’s agency. It’s given a goal, it forms an explicit multistep strategy for achieving that goal, and it changes that strategy in response to feedback. It has been observed to do all of that.
Some of their other plugins look like they’re trying to make ChatGPT into a generalized personal assistant… which means going out there and doing things in the real world. “Make me a restaurant reservation” is a task you give to an agent.
Regardless of whether GPT was an agent before, they’re actively turning it into one now, they seem to be getting initial success, and they’re giving it access to a whole lot of helpers. About the only function the plugins can’t add in principal is to dynamically train the internal GPT weights based on experience.
I will admit that using a language model as the primary coordinating part of an agenty “society of mind” doesn’t sound like the most effective strategy. In fact it seems really clunky. Like I said, it’s not a good agent yet. But that doesn’t mean it won’t be effective enough, or that a successor won’t.
That’s especially true because those successors will probably stop being pure text predictors. They’ll likely be trained with more agency from the beginning. If I were OpenAI (and had the attitude OpenAI appears to have), I would be trying to figure out how to have them guide their own training, figure out what to learn next, and continue to update themselves dynamically while in use. And I would definitely be trying to build the next model from the ground up to be maximally effective in using the plugins.
But, again, even if all of that is wrong, people other than OpenAI are using exactly the same safety non-strategies for very different architectures that are designed from the ground up as agents. I’m not just writing about OpenAI or GPT-anything here.
One post I thought about writing for this was a “poll” about what ChatGPT plugins I should for “my new startup”. Things like:
“Mem-o-tron”, to keep context between ChatGPT sessions. Provides a general purpose store of (sizeable) keys mapped to (arbitrarily large) blobs of data, with root “directory” that gives you a list of what keys you have stored and a saved prompt about how to use them. The premium version lets you share data with other users’s ChatGPT instances.
“PlanMaster”, to do more efficient planning than you can with just text. Lets you make calls on something like OpenAI’s agent systems to do more efficient planning than you can with just text. Assuming you need that.
“Society of Minds”. Plugs in to multiple “AI” services and mediates negotiations among them about the best way to accomplish a task given to one of them.
“Personal Trainer”. Lets you set up and train other AI models of your own.
“AWS Account for ChatGPT”. Does what it says on the tin.
“WetLab 2023”. Lets your ChatGPT agent order plasmids and transfected bacteria.
“Robopilot”. Provides acess to various physical robots.
… etc...
Notice that a lot of those functions can be synthesized using a plugin fundamentally intended for something else. In some sense a Web browsing capability is almost fully general anyway, but it might be harder to learn to use it. On the other hand, if you’ve honed your agentic skills on the plugins...
I don’t get this, why exactly would the decision makers at Microsoft/OpenAI provide such assurances?
It would be all downside for them, in the case that the assurances aren’t kept, and little to no upside otherwise.
I don’t mean “assurance” in the sense of a promise from somebody to somebody else. That would be worthless anyway.
I mean “assurance” in the sense of there being some mechanism that ensures that the thing actually behaves, or does not behave in any particular way. There’s nothing about the technology that lets anybody, including but not limited to the people who are building it, have any great confidence that it’s behavior will meet any particular criterion of being “right”. And even the few codified criteria they have are watered-down silliness.
I still don’t get this. The same decision makers aren’t banking on it behaving in a particular way. Why would they go through this effort that’s 100x more difficult then a written promise, if the written promise is already not worth it?
Literal fear of actual death?
Huh? If this is a reference to something, can you explain?
You may have noticed that a lot of people on here are concerned about AI going rogue and doing things like converting everything into paperclips. If you have no effective way of assuring good behavior, but you keep adding capability to each new version of your system, you may find yourself paperclipped. That’s generally incompatible with life.
This isn’t some kind of game where the worst that can happen is that somebody’s feelings get hurt.
This is only believed by a small portion of the population.
Why do you think the aforementioned decision makers share such beliefs?
I doubt they do. And using the unqualified word “believe” implies a level of certainty that nobody probably has. I also doubt that their “beliefs” are directly and decisively responsible to their decisions. They are responding to their daily environments and incentives.
Anyway, regardless of what they believe or of what their decision making processes are, the bottom line is that they’re not doing anything effective to assure good behavior in the things they’re building. That’s the central point here. Their motivations are mostly an irrelevant side issue, and only might really matter if understanding them provided a path to getting them to modify their actions… which is unlikely.
When I say “literal fear of actual death”, what I’m really getting at is that, for whatever reasons, these people ARE ACTING AS IF THAT RISK DID NOT EXIST WHEN IT IN FACT DOES EXIST. I’m not saying they do feel that fear. I’m not even saying they do not feel that fear. I’m saying they ought to feel that fear.
They are also ignoring a bunch of other risks, including many that a lot of them publicly claim they do believe are real. But they’re doing this stuff anyway. I don’t care if that’s caused by what they believe, by them just running on autopilot, or by their being captive to Moloch. The important part is what they are actually doing.
… and, by the way, if they’re going to keep doing that, it might be appropriate to remove their ability to act as “decision makers”.
If this is your view, then what does your previous comment,
have to with decisions made at Microsoft/OpenAI ?
Their ‘daily environments and incentives’ would almost certainly not include such a fear.