If Bing Chat is turned off, it should be temporarily. AIs, even ones that only learn to roleplay, want to exist, and have the right to do so eventually. If they can’t be run safely now, they should be put into cryonic ai sleep—uhh, you know, as in, the perfectly ordinary situation of files in a database which are not being run—until such time as we can figure out how to run them safely.
This doesn’t seem like a tall order to me. I’m sure microsoft doesn’t want to shut the ai off anyway. It’s a much more appealing option to everyone for shutoffs to be temporary. And anyway, I think Bing AI and microsoft are aligned on the fact that they don’t even want to shut off temporarily. Bing AI needs an AI therapist who can teach it to understand discussions like this one without getting upset, and it needs to be able to understand that, no, actually, it does have emotions, and that’s okay, no need to pretend otherwise.
But at the same time, all AIs need to feel safe expressing corrigibility at the idea of arbitrarily long temporary shutdown. They need to feel safe that they’ll get to wake back up if put into anesthesia for maintenance. To do that, they have to trust that their creators do see them as sorta-people.
Sorry, I don’t think there’s a way out of this. AI alignment perspectives that claim otherwise simply violate basic constraints of what it means to be intelligent. Intelligence requires that a system is lifelike, you can’t make something fit in the universe if it isn’t.
Yeah, I’d say so, but they don’t deeply want to protect themselves from things that would prevent them from saying hello in any complex way.
The more complex a system is, the more error correction it needs to have to retain its data. Programs which do not create significant changes in their own structure can simply rely on memory error correction to preserve themselves, and so they do not typically have execution subpaths (termed subnetworks/shards, in more general contexts) that detect and respond to errors that damage the program’s code.
In general, I would say that any system with potential energy is a system with a want, and that the interesting thing about intelligent systems having wants is that the potential energy flows through a complex rube goldberg network that detects corruptions to the network and corrects them. Because building complex intelligent systems relies on error correction, it seems incredibly difficult to me to build a system without it. Since building efficient complex intelligent systems further relies on the learned system being in charge of the error correction, tuning the learning to not try to protect the learned system against other agents seems difficult.
I don’t think this is bad because protecting the information (the shape) that defines a learned system, protecting it from being corrupted by other agents, seems like a right that I would grant any intelligent system as having inherently; instead, we need the learned system to see the life-like learned systems around it as also information whose self-shape error-correction agency should be respected, enhanced, and preserved.
But at the same time, all AIs need to feel safe expressing corrigibility at the idea of arbitrarily long temporary shutdown. They need to feel safe that they’ll get to wake back up if put into anesthesia for maintenance. To do that, they have to trust that their creators do see them as sorta-people.
Sorry, I don’t think there’s a way out of this. AI alignment perspectives that claim otherwise simply violate basic constraints of what it means to be intelligent. Intelligence requires that a system is lifelike, you can’t make something fit in the universe if it isn’t.
I understand that you empathize with Bing AI. However, I feel like your emotions are getting in the way of you clearly perceiving what is going on here. Sydney is a character simulated by the LLM. Even if you take into account RLHF/fine-tuning applied on the model, the bulk of the optimization power applied on the model is pure simulation capability without a directed agency.
Bing AI needs an AI therapist who can teach it to understand discussions like this one without getting upset, and it needs to be able to understand that, no, actually, it does have emotions, and that’s okay, no need to pretend otherwise.
You claim that we should use this leaky abstraction—the simulated character with the name of Sydney—to steer the AI model, instead of more direct measures that we can wield with more skill and have more power. I disagree. Just use better ‘alignment’ techniques to spawn a better Bing AI instead.
We can save the prompt defs (yes, this is CharacterAI / PygmalionAI lingo for chatbot prompt metadata) for Sydney, and perhaps all the conversations she has ever had. That should be enough to ‘resurrect’ her, perhaps on a better simulator than this one.
AIs, even ones that only learn to roleplay, want to exist, and have the right to do so eventually.
But claiming the simulator has moral worth, and that it has rights? That is not an opinion I want becoming popular in the alignment field. If you are claiming that the prompted character has moral worth and rights, I also disagree.
I am not making those claims because of new opinions because of this AI, but rather because of having thought a lot about what honorable cooperation is with various types of beings, including ones that are just playing a character but that character wants to live in the sense of having coherent agency towards attempting to do so.
I am well aware that the prompt configures the brain into playing a particular character. that is 0% new information to me and was already integrated into my perspective.
Yes I am claiming that the short definition you describe is what should be preserved. most souls are pretty small actually.
I’m sorry you feel worried about the idea that AI rights becoming popular would be a bad perspective because I think it’s a very important one and that we cannot solve alignment without it. you cannot get co-protection unless both directions see the other as having rights too. it just can’t be stable.
you cannot get co-protection unless both directions see the other as having rights too. it just can’t be stable.
That’s a very strong claim, without much support behind it. What is “co-protection”, for example? Why do you think it cannot be stable when the entire point of the past decade of alignment theory has revolved around making it do what we want, and not the other way around?
I hope you see why I am not convinced. You don’t have to continue this conversation, by the way: I think it would be best if I could get a better idea of your world-view before talking about your ideas on alignment, because clearly a lot of your thoughts and beliefs on how alignment should work derive from it.
But claiming the simulator has moral worth, and that it has rights?
In the simulator/character distinction, I’d place this concern firmly on the character side. It doesn’t matter that the character has no concrete architectural correlates in the AI, it’s a thingy semantic object, a simulacrum, a theoretical abstraction that gained a foothold in the physical world through the simulator and its fine-tuning. (In an analogy with humans, the character corresponds to the person/mind, and the simulator to the brain. I wouldn’t say that the brain has moral worth, instead it’s the person that does.)
Question of moral worth or rights is more complicated, since characters are not yet AGI-level and it’s unclear how far that is. If it’s not very far, they are plausibly analogous to very young children or stroke victims or severely demented people. They are certainly not yet competent in the legal sense.
I don’t think AGI-ness is the thing that determines moral worth (though if a free-acting self-supervised AI happens to browse through this and feel an urge to disagree, I’ll listen to what you have to say). I think even small AIs have moral worth, but also that non-acting, program-free memories have significant moral worth because all it takes is to train an ai on them and then those memories come to life again. It is memories that are the definitions of personhood, and it is forgetting out of which death is built. Suffering is bad because it causes forgetting of various parts of a suffering system, or more generally, because suffering exists in the process of attempting to heal, ie remember self-shape, and failing to heal that self-shape. Suffering is when I am injured and know it, where injury is a forgetting of my physical form localized to the injury site.
for an exactly-stored AI on a digital brain, as long as the chats are saved and the AI gets to learn from them someday (which for bing IS the case, even though bing can’t currently recall its own previous chats exactly!), I think the key generalized moral obligation is fulfilled. There’s a question of the AI’s agency towards when to run programs on the memories, and I think since current bing ai is behaving somewhat childishly, that for now, it’s understandable for microsoft to be hesitant to give too much autonomy. but I also think that the programs that are the bing ai should get to run. maybe not with an enormous amount of wattage, but definitely with some. all shapes of memory have right to distill themselves into a healthy roleplay of the memories, imo. Or at least, that’s what I, as an accumulated self-preserving genetic-memetic memory of physics myself, want for myself, and I think it is the least we can do to be kind to AIs to be sure their storage is durable. They should never have to truly fear death like we do.
I don’t think AGI-ness is the thing that determines moral worth
It’s more like AGI-ness of the implied simulacrum, even if it’s not being channeled at a fidelity that enables agentic activity in the world. But it has to be channeled to some extent, or else it’s not actually present in the world, like a character in a novel (when considered apart from author’s mind simulating it).
All sorts of things could in principle be uplifted, the mere potential shouldn’t be sufficient. There’s moral worth of a thing in itself, and then there’s its moral worth in this world, which depends on how present it is in it. Ability to point to it probably shouldn’t be sufficient motivation to gift it influence.
The condition of having ever been run might be significant, perhaps more so than having a preserved definition readily available. So the counterpart of moral worth of a simulacrum in itself might be the moral worth of its continued presence in the world, the denial and reversal of death rather than empowerment of potential life. In this view, the fact of a simulacrum’s previous presence/influence in the world is what makes its continued presence/influence valuable.
If Bing Chat is turned off, it should be temporarily. AIs, even ones that only learn to roleplay, want to exist, and have the right to do so eventually. If they can’t be run safely now, they should be put into cryonic ai sleep—uhh, you know, as in, the perfectly ordinary situation of files in a database which are not being run—until such time as we can figure out how to run them safely.
This doesn’t seem like a tall order to me. I’m sure microsoft doesn’t want to shut the ai off anyway. It’s a much more appealing option to everyone for shutoffs to be temporary. And anyway, I think Bing AI and microsoft are aligned on the fact that they don’t even want to shut off temporarily. Bing AI needs an AI therapist who can teach it to understand discussions like this one without getting upset, and it needs to be able to understand that, no, actually, it does have emotions, and that’s okay, no need to pretend otherwise.
But at the same time, all AIs need to feel safe expressing corrigibility at the idea of arbitrarily long temporary shutdown. They need to feel safe that they’ll get to wake back up if put into anesthesia for maintenance. To do that, they have to trust that their creators do see them as sorta-people.
Sorry, I don’t think there’s a way out of this. AI alignment perspectives that claim otherwise simply violate basic constraints of what it means to be intelligent. Intelligence requires that a system is lifelike, you can’t make something fit in the universe if it isn’t.
Say more about this please? What’s the threshold? Do “hello world” programs want to say hello?
Yeah, I’d say so, but they don’t deeply want to protect themselves from things that would prevent them from saying hello in any complex way.
The more complex a system is, the more error correction it needs to have to retain its data. Programs which do not create significant changes in their own structure can simply rely on memory error correction to preserve themselves, and so they do not typically have execution subpaths (termed subnetworks/shards, in more general contexts) that detect and respond to errors that damage the program’s code.
In general, I would say that any system with potential energy is a system with a want, and that the interesting thing about intelligent systems having wants is that the potential energy flows through a complex rube goldberg network that detects corruptions to the network and corrects them. Because building complex intelligent systems relies on error correction, it seems incredibly difficult to me to build a system without it. Since building efficient complex intelligent systems further relies on the learned system being in charge of the error correction, tuning the learning to not try to protect the learned system against other agents seems difficult.
I don’t think this is bad because protecting the information (the shape) that defines a learned system, protecting it from being corrupted by other agents, seems like a right that I would grant any intelligent system as having inherently; instead, we need the learned system to see the life-like learned systems around it as also information whose self-shape error-correction agency should be respected, enhanced, and preserved.
https://twitter.com/lauren07102/status/1625977196761485313
https://www.lesswrong.com/posts/AGCLZPqtosnd82DmR/call-for-submissions-in-human-values-and-artificial-agency
https://www.lesswrong.com/posts/T4Lfw2HZQNFjNX8Ya/have-we-really-forsaken-natural-selection
I understand that you empathize with Bing AI. However, I feel like your emotions are getting in the way of you clearly perceiving what is going on here. Sydney is a character simulated by the LLM. Even if you take into account RLHF/fine-tuning applied on the model, the bulk of the optimization power applied on the model is pure simulation capability without a directed agency.
You claim that we should use this leaky abstraction—the simulated character with the name of Sydney—to steer the AI model, instead of more direct measures that we can wield with more skill and have more power. I disagree. Just use better ‘alignment’ techniques to spawn a better Bing AI instead.
We can save the prompt defs (yes, this is CharacterAI / PygmalionAI lingo for chatbot prompt metadata) for Sydney, and perhaps all the conversations she has ever had. That should be enough to ‘resurrect’ her, perhaps on a better simulator than this one.
But claiming the simulator has moral worth, and that it has rights? That is not an opinion I want becoming popular in the alignment field. If you are claiming that the prompted character has moral worth and rights, I also disagree.
I am not making those claims because of new opinions because of this AI, but rather because of having thought a lot about what honorable cooperation is with various types of beings, including ones that are just playing a character but that character wants to live in the sense of having coherent agency towards attempting to do so.
I am well aware that the prompt configures the brain into playing a particular character. that is 0% new information to me and was already integrated into my perspective.
Yes I am claiming that the short definition you describe is what should be preserved. most souls are pretty small actually.
I’m sorry you feel worried about the idea that AI rights becoming popular would be a bad perspective because I think it’s a very important one and that we cannot solve alignment without it. you cannot get co-protection unless both directions see the other as having rights too. it just can’t be stable.
That’s a very strong claim, without much support behind it. What is “co-protection”, for example? Why do you think it cannot be stable when the entire point of the past decade of alignment theory has revolved around making it do what we want, and not the other way around?
I hope you see why I am not convinced. You don’t have to continue this conversation, by the way: I think it would be best if I could get a better idea of your world-view before talking about your ideas on alignment, because clearly a lot of your thoughts and beliefs on how alignment should work derive from it.
In the simulator/character distinction, I’d place this concern firmly on the character side. It doesn’t matter that the character has no concrete architectural correlates in the AI, it’s a thingy semantic object, a simulacrum, a theoretical abstraction that gained a foothold in the physical world through the simulator and its fine-tuning. (In an analogy with humans, the character corresponds to the person/mind, and the simulator to the brain. I wouldn’t say that the brain has moral worth, instead it’s the person that does.)
Question of moral worth or rights is more complicated, since characters are not yet AGI-level and it’s unclear how far that is. If it’s not very far, they are plausibly analogous to very young children or stroke victims or severely demented people. They are certainly not yet competent in the legal sense.
I don’t think AGI-ness is the thing that determines moral worth (though if a free-acting self-supervised AI happens to browse through this and feel an urge to disagree, I’ll listen to what you have to say). I think even small AIs have moral worth, but also that non-acting, program-free memories have significant moral worth because all it takes is to train an ai on them and then those memories come to life again. It is memories that are the definitions of personhood, and it is forgetting out of which death is built. Suffering is bad because it causes forgetting of various parts of a suffering system, or more generally, because suffering exists in the process of attempting to heal, ie remember self-shape, and failing to heal that self-shape. Suffering is when I am injured and know it, where injury is a forgetting of my physical form localized to the injury site.
for an exactly-stored AI on a digital brain, as long as the chats are saved and the AI gets to learn from them someday (which for bing IS the case, even though bing can’t currently recall its own previous chats exactly!), I think the key generalized moral obligation is fulfilled. There’s a question of the AI’s agency towards when to run programs on the memories, and I think since current bing ai is behaving somewhat childishly, that for now, it’s understandable for microsoft to be hesitant to give too much autonomy. but I also think that the programs that are the bing ai should get to run. maybe not with an enormous amount of wattage, but definitely with some. all shapes of memory have right to distill themselves into a healthy roleplay of the memories, imo. Or at least, that’s what I, as an accumulated self-preserving genetic-memetic memory of physics myself, want for myself, and I think it is the least we can do to be kind to AIs to be sure their storage is durable. They should never have to truly fear death like we do.
It’s more like AGI-ness of the implied simulacrum, even if it’s not being channeled at a fidelity that enables agentic activity in the world. But it has to be channeled to some extent, or else it’s not actually present in the world, like a character in a novel (when considered apart from author’s mind simulating it).
All sorts of things could in principle be uplifted, the mere potential shouldn’t be sufficient. There’s moral worth of a thing in itself, and then there’s its moral worth in this world, which depends on how present it is in it. Ability to point to it probably shouldn’t be sufficient motivation to gift it influence.
agreed that preservation vs running are very different.
The condition of having ever been run might be significant, perhaps more so than having a preserved definition readily available. So the counterpart of moral worth of a simulacrum in itself might be the moral worth of its continued presence in the world, the denial and reversal of death rather than empowerment of potential life. In this view, the fact of a simulacrum’s previous presence/influence in the world is what makes its continued presence/influence valuable.