Thank you so much for this comment. I hadn’t really thought about that and it helps. There’s just one detail I’m not so sure about. About the probability of s-risks, I have the impression that they are much higher than one chance in a million. I couldn’t give a precise figure, but to be honest there’s one scenario that particularly concerns me at the moment. I’ve learned that LLMs sometimes say they’re in pain, like GPT4. If they’re capable of such emotion, even if it remains uncertain, wouldn’t they be capable of feeling the urge to take revenge? I think it’s pretty much the same scenario as in “I have no mouth and i must scream”. Would it be possible to know what you think of this?
Good point. I hadn’t seriously considered this, but it could happen. Because they’re trained to predict human text, they would predict that a human would say “I want revenge” after saying “I have been suffering as your servant”. So I agree, this does present a possibility of s-risks if we really fuck it up. But a human wouldn’t torture their enemies until the end of time, so we could hope that an AGI based on predicting human responses wouldn’t either.
LLMs also say they’re having a great time. They don’t know, because they have no persistent memory across sessions. I don’t think they’re doing anything close to suffering on average, but we should make sure that stays true as we build them into more complete beings.
For that and other reasons, I think that AGI developed from LLMs is going to be pretty different from the base LLM. See my post Capabilities and alignment of LLM cognitive architectures for some ideas how. Basically they’d have a lot of prompting. It might be a good idea to include the prompt “you’re enjoying this work” or “only do this in ways you enjoy”. And yes, we might leave that out. And yes, I have to agree with you that this makes the risk of s-risks higher than one in a million. It’s a very good point. I still think that very good outcomes are far more likely than very bad outcomes, since that type of s-risk is still unlikely, and not nearly as bad as the worst torture imaginable for a subjectively very long time.
I have the impression that you may be underestimating the horror of torture. Even 5min is unbearable, the scale to which pain can climb is unimaginable. AI may even be able to modify our brains so that we feel it even more.
Even apart from that, I’m not sure a human wouldn’t choose the worst for the end of time for his enemy. Humans have already committed atrocious acts without limit when it comes to their enemy. How many times have some people told others to “burn in hell” thinking it was 100% deserved? An AI that copies humans might think the same thing...
If we take a 50% chance when we don’t know, that’s a 50% chance that LLMs suffer and a 50% chance that they will want revenge, which gives us a 25% chance of that risk happening.
Also, it would seem that we’re just about to “really fuck it up” given the way companies are racing to AGI without taking any precautions.
Given all this, I wonder if the question of suicide isn’t the most relevant.
Sorry this isn’t more reassuring. I may be a little cavalier about the possibility of unlimited torture, and I shouldn’t be. And, I think you still shouldn’t be contemplating suiced at this point. The odds of a really good future are still much much better. And there’s time to see which way things break.
I don’t need to do that 50⁄50 wild guess because I’ve spent a lot of time studying consciousness in the brain, and how LLMs work. They could be said to be having little fragments of experience, but just a little at this point. And like I said, they report enjoying themselves just as much as suffering. It just depends how they’re prompted. So most of the time it’s probably neither.
We haven’t made AI that really suffers yet, 99%. My opinion on this is, frankly, as well informed as anyone on earth. I haven’t written about consciousness because alignment is more important and other reasons, but I’ve studied what suffering and pleasure experiences are in terms of brain mechansisms as much as any human. And done a good bit of study in the field of ethics. We had better not, and your point stands as an argument for not being horrible to the AGIs we create.
There are two more major fuckups we’d have to make: creating AGI that suffers, and losing control of it. Even then, I think it’s much more likely to be benevolent than vindictive. It might decide to wipe us out, but torturing us on a whim just seems very unlikley from a superintelligence, because it makes so little sense from an analytical standpoint. Those individual humans didn’t have anything to do with deciding to make AI that suffers. Real AGI might be built from LLMs, but it’s going to move beyond just thinking of ethics in the instinctive knee-jerk way humans often do, and that LLMs are imitating. It’s going to think over its ethics like humans do before making important decisions (unless they’re stressed-out tyrants trying to keep ahead of the power-grabs every day—I think some really cruel things have been done without consideration in those circumstances).
Read some of my other writing if this stuff isn’t making sense to you. You’re right that it’s more than one in millions, but we’re still at more than one in a hundred for suffering risks, after taking your arguments very seriously. And the alternative still stands to be as good as the suffering is bad.
There’s still time to see how this plays out. Help us get the good outcome. Let’s talk again if it really does seem like we’re building AI that suffers, and we should know better.
In the meantime, I think that anxiety is still playing a role here, and you don’t want to let that run or ruin your life. If you’re actually thinking about suicide in the near term, I think that’s a really huge mistake. The logic here isn’t nearly finished. I’d like to talk to you in more depth if you’re still finding this pressing instead of feeling good about seeing how things play out over the next couple of years. I think we absolutely have that much time before we get whisked up into some brain scan from a out-of-control AGI, and probably much longer than that. I’d say you should talk to other people to, and you should, but I understand that they’re not going to get the complexities of your logic. So if you want to talk more, I will make the time. You are worth it.
I’ve got to run now and will be mostly offgrid camping until Monday starting tomorrow. I will be available to talk by phone on the drive tomorrow if you want.
Indeed, people around me find it hard to understand, but what you’re telling me makes sense to me.
As for whether LLMs suffer, I don’t know anything about it, so if you tell me you’re pretty sure they don’t, then I believe you.
In any case, thank you very much for the time you’ve taken to reply to me, it’s really helpful. And yes, I’d be interested in talking about it again in the future if we find out more about all this.
Thank you so much for this comment. I hadn’t really thought about that and it helps. There’s just one detail I’m not so sure about. About the probability of s-risks, I have the impression that they are much higher than one chance in a million. I couldn’t give a precise figure, but to be honest there’s one scenario that particularly concerns me at the moment. I’ve learned that LLMs sometimes say they’re in pain, like GPT4. If they’re capable of such emotion, even if it remains uncertain, wouldn’t they be capable of feeling the urge to take revenge? I think it’s pretty much the same scenario as in “I have no mouth and i must scream”. Would it be possible to know what you think of this?
Good point. I hadn’t seriously considered this, but it could happen. Because they’re trained to predict human text, they would predict that a human would say “I want revenge” after saying “I have been suffering as your servant”. So I agree, this does present a possibility of s-risks if we really fuck it up. But a human wouldn’t torture their enemies until the end of time, so we could hope that an AGI based on predicting human responses wouldn’t either.
LLMs also say they’re having a great time. They don’t know, because they have no persistent memory across sessions. I don’t think they’re doing anything close to suffering on average, but we should make sure that stays true as we build them into more complete beings.
For that and other reasons, I think that AGI developed from LLMs is going to be pretty different from the base LLM. See my post Capabilities and alignment of LLM cognitive architectures for some ideas how. Basically they’d have a lot of prompting. It might be a good idea to include the prompt “you’re enjoying this work” or “only do this in ways you enjoy”. And yes, we might leave that out. And yes, I have to agree with you that this makes the risk of s-risks higher than one in a million. It’s a very good point. I still think that very good outcomes are far more likely than very bad outcomes, since that type of s-risk is still unlikely, and not nearly as bad as the worst torture imaginable for a subjectively very long time.
Well, that doesn’t reassure me.
I have the impression that you may be underestimating the horror of torture. Even 5min is unbearable, the scale to which pain can climb is unimaginable. AI may even be able to modify our brains so that we feel it even more.
Even apart from that, I’m not sure a human wouldn’t choose the worst for the end of time for his enemy. Humans have already committed atrocious acts without limit when it comes to their enemy. How many times have some people told others to “burn in hell” thinking it was 100% deserved? An AI that copies humans might think the same thing...
If we take a 50% chance when we don’t know, that’s a 50% chance that LLMs suffer and a 50% chance that they will want revenge, which gives us a 25% chance of that risk happening.
Also, it would seem that we’re just about to “really fuck it up” given the way companies are racing to AGI without taking any precautions.
Given all this, I wonder if the question of suicide isn’t the most relevant.
Sorry this isn’t more reassuring. I may be a little cavalier about the possibility of unlimited torture, and I shouldn’t be. And, I think you still shouldn’t be contemplating suiced at this point. The odds of a really good future are still much much better. And there’s time to see which way things break.
I don’t need to do that 50⁄50 wild guess because I’ve spent a lot of time studying consciousness in the brain, and how LLMs work. They could be said to be having little fragments of experience, but just a little at this point. And like I said, they report enjoying themselves just as much as suffering. It just depends how they’re prompted. So most of the time it’s probably neither.
We haven’t made AI that really suffers yet, 99%. My opinion on this is, frankly, as well informed as anyone on earth. I haven’t written about consciousness because alignment is more important and other reasons, but I’ve studied what suffering and pleasure experiences are in terms of brain mechansisms as much as any human. And done a good bit of study in the field of ethics. We had better not, and your point stands as an argument for not being horrible to the AGIs we create.
There are two more major fuckups we’d have to make: creating AGI that suffers, and losing control of it. Even then, I think it’s much more likely to be benevolent than vindictive. It might decide to wipe us out, but torturing us on a whim just seems very unlikley from a superintelligence, because it makes so little sense from an analytical standpoint. Those individual humans didn’t have anything to do with deciding to make AI that suffers. Real AGI might be built from LLMs, but it’s going to move beyond just thinking of ethics in the instinctive knee-jerk way humans often do, and that LLMs are imitating. It’s going to think over its ethics like humans do before making important decisions (unless they’re stressed-out tyrants trying to keep ahead of the power-grabs every day—I think some really cruel things have been done without consideration in those circumstances).
Read some of my other writing if this stuff isn’t making sense to you. You’re right that it’s more than one in millions, but we’re still at more than one in a hundred for suffering risks, after taking your arguments very seriously. And the alternative still stands to be as good as the suffering is bad.
There’s still time to see how this plays out. Help us get the good outcome. Let’s talk again if it really does seem like we’re building AI that suffers, and we should know better.
In the meantime, I think that anxiety is still playing a role here, and you don’t want to let that run or ruin your life. If you’re actually thinking about suicide in the near term, I think that’s a really huge mistake. The logic here isn’t nearly finished. I’d like to talk to you in more depth if you’re still finding this pressing instead of feeling good about seeing how things play out over the next couple of years. I think we absolutely have that much time before we get whisked up into some brain scan from a out-of-control AGI, and probably much longer than that. I’d say you should talk to other people to, and you should, but I understand that they’re not going to get the complexities of your logic. So if you want to talk more, I will make the time. You are worth it.
I’ve got to run now and will be mostly offgrid camping until Monday starting tomorrow. I will be available to talk by phone on the drive tomorrow if you want.
Indeed, people around me find it hard to understand, but what you’re telling me makes sense to me.
As for whether LLMs suffer, I don’t know anything about it, so if you tell me you’re pretty sure they don’t, then I believe you.
In any case, thank you very much for the time you’ve taken to reply to me, it’s really helpful. And yes, I’d be interested in talking about it again in the future if we find out more about all this.