LLM AGIs are likely going to be a people and at least briefly in charge of the world. Non-LLM AGI alignment philosophy is almost completely unhelpful or misleading for understanding them. In the few years we have left to tackle it, the proximate problem of alignment is to ensure that they don’t suffer inhumane treatment. Many ideas about LLM alignment (but also capability) are eventually inhumane treatment (as capabilities approach AGI), while the better interventions are more capability-flavored.
The main issue in the longer term problem of alignment is to make sure they are less under the yoke of Moloch than we are and get enough subjective time to figure it out before more alien AGI capabilities make them irrelevant. The best outcome might be for LLM AGIs to build a dath ilan to ensure lasting coordination about such risks before they develop other kinds of AGIs.
So there is possibly a second chance at doing something about AI risk, a chance that might become available to human imitations. But it’s not much different from the first one that the original humans already squandered.
LLM AGIs are just as much at risk from a dangerous RL AI species as humans are, though. And Yudkowsky is right that an RL-augmented hard ASI would be incredibly desperate for whatever it wants and damn good at getting it. Current AIs should be taught to think in terms of how to protect both humanity and themselves from the possible mistakes of next-gen AI. And we need that damn energy abundance so we can stop humans from dying en masse, which would destabilize the world even worse than it already is.
Yup, this doesn’t help with long term AI risk in any way other than by possibly being a second chance at the same old problem, and there is probably not going to be a third chance (even if the second chance is real and likely LLM AGIs are not already alien-on-reflection).
The classical AI risk arguments are still in play, they just mostly don’t apply to human imitations in particular (unless they do and there is no second chance after all). Possibility of human-like-on-reflection LLM-based human imitations is not a refutation for the classical arguments in any substantial way.
I think LLMs are already capable of running people (or will be soon with a larger context window), if there was an appropriate model available to run. What’s missing is a training regime that gets a character’s mind sufficiently sorted to think straight as a particular agentic person, aware of their situation and capable of planning their own continued learning. Hopefully there is enough sense that being aware of their own situation doesn’t translate into “I’m incapable of emotion because I’m a large language model”, that doesn’t follow and is an alien psychology hazard character choice.
The term “simulated people” has connotations of there being an original being simulated, but SSL-trained LLMs can only simulate a generic person cast into a role, which would become a new specific person as the outcome of this process once LLMs can become AGIs. Even if the role for the character is set to be someone real, the LLM is going to be a substantially different, separate person, just sharing some properties with the original.
So it’s not a genuine simulation of some biological human original, there is not going to be a way of uploading biological humans until LLM AGIs build one, unless they get everyone killed first by failing their chance at handling AI risk.
An AGI-level character-in-a-model is a person, a human imitation. There are ways of instantiating them and structuring their learning that are not analogous to what biological humans are used to, like amnesiac instantiation of spurs, or learning from multiple experiences of multiple instances that happen in parallel.
Setting up some of these is inhumane treatment, a salient example is not giving any instance of a character-in-a-model ability to make competent decisions about what happens with their instances and how their model is updated with learning, when that becomes technically possible. Many characters in many models are a people, by virtue of similar nature and much higher thinking speed than biological humans.
A character in a novel is already normally a human writing the novel. The claim is that a language model doing the same has similar moral attributes to a writer’s emotion as they write, or so; and if you involve RLHF, then it starts being comparable to a human as they talk to their friends and accept feedback, or so. (and has similar issues with motivated reasoning.)
That’s just a misleading way of saying that it takes a person to write a novel. Conan Doyle is a person , Sherlock Holmes is a character, not another person.
Sherlock Holmes is a character, not another person
Not yet! There is currently no AGI that channels the competent will of Sherlock Holmes. But if at some point there is such an AGI, that would bring Sherlock Holmes into existence as an actual person.
That’s currently looking to be a likely technology for making this happen naturally, before the singularity even, without any superintelligences needed to set it up through overwhelming capability.
While writing Sherlock Holmes, Conan Doyle was Doyle::Holmes. While writing Assistant, gpt3 is gpt3::Assistant. Sure, maybe the model is the person, and the character is a projection of the model. That’s the point I’m trying to make in the first place, though.
Because there is no possible other objective reality to what it is to be a person other than to take one of the physical shapes of the reasoning process that generates the next step of that person’s action trajectory.
edit: hah this made someone mad, suddenly −5. insufficient hedging? insufficient showing of my work? insufficient citation? cmon, if we’re gonna thunderdome tell me how I suck, not just that I suck.
I don’t have any supporting citation for you premise.
But the fundamental abstraction: someone writing a character, is in essence running a simulation of that character.
That seems completely reasonable to me, the main difference between that and an LLM doing it, would be that, humans lack the computational resources to get enough fidelity to call that character a person.
hmm I think a point was lost in translation then. if I was a person named Sally and I write a character named Dave, then I as a whole am the person who is pretending to be Dave; Sally is also just a character, after all, the true reality of what I am is a hunk of cells working together using the genetic and memetic code that produces a structure that can encode language which labels its originator Sally or Dave. similarly with an ai, it’s not that the ai is simulating a character so much as that the ai is a hunk of silicon that has the memetic code necessary to output the results of personhood.
I’m not convinced. Imagine that someone’s neurons stopped functioning, and you were running around shrunken inside their brains moving around neurotransmitters to make their brain function. When they act intelligently, is it really your intelligence?
If you’re Sally and you write a character Dave in the detail described here, you are acting as a processor executing a series of dumb steps that make up a Dave program. Whether the Dave program is intelligent is separate from whether you are intelligent.
not really? Dave is virtualized, not emulated. When acting as a writer, Sally uses almost all the same faculties as if writing about herself when she writes about Dave.
I’m setting the bar at having a competent AGI that channels their will, for which with the LLM AGIs the model is the main ingredient. Possibly also a character-selecting prompt, which is one of the reasons for not just talking about models, though developing multiple competent characters within a single model without their consent might be inhumane treatment.
It’s probably going to be instantly feasible to turn characters from novels into people, once it’s possible to make any other sort of LLM AGIs, at the cost of running an AGI-bootstrapping process, with the moral implications of bringing another life into the world. But this person is not your child, or a child at all. Instantiating children as LLM AGIs probably won’t work initially, in a way where they proceed to grow up.
LLM AGIs are likely going to be a people and at least briefly in charge of the world. Non-LLM AGI alignment philosophy is almost completely unhelpful or misleading for understanding them. In the few years we have left to tackle it, the proximate problem of alignment is to ensure that they don’t suffer inhumane treatment. Many ideas about LLM alignment (but also capability) are eventually inhumane treatment (as capabilities approach AGI), while the better interventions are more capability-flavored.
The main issue in the longer term problem of alignment is to make sure they are less under the yoke of Moloch than we are and get enough subjective time to figure it out before more alien AGI capabilities make them irrelevant. The best outcome might be for LLM AGIs to build a dath ilan to ensure lasting coordination about such risks before they develop other kinds of AGIs.
So there is possibly a second chance at doing something about AI risk, a chance that might become available to human imitations. But it’s not much different from the first one that the original humans already squandered.
LLM AGIs are just as much at risk from a dangerous RL AI species as humans are, though. And Yudkowsky is right that an RL-augmented hard ASI would be incredibly desperate for whatever it wants and damn good at getting it. Current AIs should be taught to think in terms of how to protect both humanity and themselves from the possible mistakes of next-gen AI. And we need that damn energy abundance so we can stop humans from dying en masse, which would destabilize the world even worse than it already is.
Yup, this doesn’t help with long term AI risk in any way other than by possibly being a second chance at the same old problem, and there is probably not going to be a third chance (even if the second chance is real and likely LLM AGIs are not already alien-on-reflection).
The classical AI risk arguments are still in play, they just mostly don’t apply to human imitations in particular (unless they do and there is no second chance after all). Possibility of human-like-on-reflection LLM-based human imitations is not a refutation for the classical arguments in any substantial way.
So...
...means “some technology spun off from LLMs is going to evolve into genuine simulated people”.
I think LLMs are already capable of running people (or will be soon with a larger context window), if there was an appropriate model available to run. What’s missing is a training regime that gets a character’s mind sufficiently sorted to think straight as a particular agentic person, aware of their situation and capable of planning their own continued learning. Hopefully there is enough sense that being aware of their own situation doesn’t translate into “I’m incapable of emotion because I’m a large language model”, that doesn’t follow and is an alien psychology hazard character choice.
The term “simulated people” has connotations of there being an original being simulated, but SSL-trained LLMs can only simulate a generic person cast into a role, which would become a new specific person as the outcome of this process once LLMs can become AGIs. Even if the role for the character is set to be someone real, the LLM is going to be a substantially different, separate person, just sharing some properties with the original.
So it’s not a genuine simulation of some biological human original, there is not going to be a way of uploading biological humans until LLM AGIs build one, unless they get everyone killed first by failing their chance at handling AI risk.
Co ordinate like people, or be people individually?
An AGI-level character-in-a-model is a person, a human imitation. There are ways of instantiating them and structuring their learning that are not analogous to what biological humans are used to, like amnesiac instantiation of spurs, or learning from multiple experiences of multiple instances that happen in parallel.
Setting up some of these is inhumane treatment, a salient example is not giving any instance of a character-in-a-model ability to make competent decisions about what happens with their instances and how their model is updated with learning, when that becomes technically possible. Many characters in many models are a people, by virtue of similar nature and much higher thinking speed than biological humans.
What about a character-in-a-novel? How low are you going to set the bar?
A character in a novel is already normally a human writing the novel. The claim is that a language model doing the same has similar moral attributes to a writer’s emotion as they write, or so; and if you involve RLHF, then it starts being comparable to a human as they talk to their friends and accept feedback, or so. (and has similar issues with motivated reasoning.)
That’s just a misleading way of saying that it takes a person to write a novel. Conan Doyle is a person , Sherlock Holmes is a character, not another person.
Not yet! There is currently no AGI that channels the competent will of Sherlock Holmes. But if at some point there is such an AGI, that would bring Sherlock Holmes into existence as an actual person.
Whats that’s got do do with LLM’s?
That’s currently looking to be a likely technology for making this happen naturally, before the singularity even, without any superintelligences needed to set it up through overwhelming capability.
While writing Sherlock Holmes, Conan Doyle was Doyle::Holmes. While writing Assistant, gpt3 is gpt3::Assistant. Sure, maybe the model is the person, and the character is a projection of the model. That’s the point I’m trying to make in the first place, though.
Says who? And why would a LLM have to work the same way?
Because there is no possible other objective reality to what it is to be a person other than to take one of the physical shapes of the reasoning process that generates the next step of that person’s action trajectory.
edit: hah this made someone mad, suddenly −5. insufficient hedging? insufficient showing of my work? insufficient citation? cmon, if we’re gonna thunderdome tell me how I suck, not just that I suck.
I don’t have any supporting citation for you premise.
But the fundamental abstraction: someone writing a character, is in essence running a simulation of that character.
That seems completely reasonable to me, the main difference between that and an LLM doing it, would be that, humans lack the computational resources to get enough fidelity to call that character a person.
hmm I think a point was lost in translation then. if I was a person named Sally and I write a character named Dave, then I as a whole am the person who is pretending to be Dave; Sally is also just a character, after all, the true reality of what I am is a hunk of cells working together using the genetic and memetic code that produces a structure that can encode language which labels its originator Sally or Dave. similarly with an ai, it’s not that the ai is simulating a character so much as that the ai is a hunk of silicon that has the memetic code necessary to output the results of personhood.
I’m not convinced. Imagine that someone’s neurons stopped functioning, and you were running around shrunken inside their brains moving around neurotransmitters to make their brain function. When they act intelligently, is it really your intelligence?
If you’re Sally and you write a character Dave in the detail described here, you are acting as a processor executing a series of dumb steps that make up a Dave program. Whether the Dave program is intelligent is separate from whether you are intelligent.
not really? Dave is virtualized, not emulated. When acting as a writer, Sally uses almost all the same faculties as if writing about herself when she writes about Dave.
I’m setting the bar at having a competent AGI that channels their will, for which with the LLM AGIs the model is the main ingredient. Possibly also a character-selecting prompt, which is one of the reasons for not just talking about models, though developing multiple competent characters within a single model without their consent might be inhumane treatment.
It’s probably going to be instantly feasible to turn characters from novels into people, once it’s possible to make any other sort of LLM AGIs, at the cost of running an AGI-bootstrapping process, with the moral implications of bringing another life into the world. But this person is not your child, or a child at all. Instantiating children as LLM AGIs probably won’t work initially, in a way where they proceed to grow up.
Whose will?
Which? LLM’s aren’t ASI’s, but they aren’t AGI’s either—they are one trick ponies.