“The implication is that the AI doesn’t want to learn anything new.”
At first, I was confused by this statement, but then I had an epiphany. It’s because understanding gradient estimation methods can be challenging and that’s totally okay. Your input is valuable because it highlights how unfamiliar this topic is for many people, as most are even less familiar.
Here’s the short answer: You (or neural networks ig) do not “learn” terminal goals. You can’t learn not to like boobs if that’s what are into. (Well something like that can happen but it’s because they are instrumental goals for human evolution, it’s complicated)
Neural networks are designed to provide an estimated solution to a set of equation. This equation remains fixed; only the parameters can be adjusted (or the ai is useless).
During the training phase, the neural network aims for a specific goal and more training simply makes this estimate more precise. It doesn’t change the fundamental objective. If you try to use a neural network trained for one task, like language modeling (LLM), for a completely different task, like image generation, the output will be nonsensical.
The longer answer involves a fundamental misunderstanding between Eliezer Yudkowsky’s concept of “super intelligence” and what many people think it means. Eliezer’s scenario doesn’t involve the AI becoming self-aware or more human-like. Instead, it’s about “just” solving a complex function. Such an AI likely won’t possess qualia or consciousness because that’s inefficient for its purpose.
And what is its purpose? Avoiding the button to be pressed, because that is what back propagation does.
So again, if you say “no i dont like it” and press the button then the ai will try to do something that makes it less likely that you press a button.
The issue is, humans think that the ai suddenly cares about them, when in-fact it cares about the button only.
When someone says AI “resists change or new training,” it implies that the AI is unwilling to “understand” new information. In reality, it’s not about unwillingness to learn; it’s about the AI’s design and purpose not aligning with the changes you want to implement.
At the scale of a super intelligence new information is not learning.
Think about a court case where the judge hears a confession that was unexpected. It’s new information but it doesn’t involve learning. The judge is updating a parameter (guilty) and provides a solution accordingly (sentencing)
In the end, I realize I didn’t convey the core issue clearly. I hope this explanation helps bridge that gap.
The comic still doesn’t make sense to me. I’m not getting from your explanation why that robot killed that human. I can come up with multiple interpretations, but I still don’t know yours.
Which means that your average reader is going to find it completely confusing. And since that’s at the center of that comic (which again, is really cool art, writing, and layout, so I hope people like and share it!) they’re not going to “get it” and won’t really get anything out of it other than “this guy thinks AI will take over for some weird reason I can’t understand”.
As it happens, I’ve spent 23 years in a research lab centered on learning in the brain, doing everything from variations on backpropagation, reinforcement learning, and other types of learning, to figuring out how that learning leads to human behavior. Since then I’ve been applying that thinking to AGI and alignment.
So if I’m not getting it, even after taking multiple tries at it, it’s pretty certain that your non-expert audience won’t get it either.
I went back and looked at your first explanation of the metaphor of the button. I found it confusing, despite having been through similar explanations a thousand times. I suggest you polish your own thinking by trying to condense it into plain english that someone without any background would understand (since I assume your comic is meant for everyone). Then put that into the crucial spot in the comic.
“The implication is that the AI doesn’t want to learn anything new.”
At first, I was confused by this statement, but then I had an epiphany. It’s because understanding gradient estimation methods can be challenging and that’s totally okay. Your input is valuable because it highlights how unfamiliar this topic is for many people, as most are even less familiar.
Here’s the short answer: You (or neural networks ig) do not “learn” terminal goals. You can’t learn not to like boobs if that’s what are into. (Well something like that can happen but it’s because they are instrumental goals for human evolution, it’s complicated)
Neural networks are designed to provide an estimated solution to a set of equation. This equation remains fixed; only the parameters can be adjusted (or the ai is useless).
During the training phase, the neural network aims for a specific goal and more training simply makes this estimate more precise. It doesn’t change the fundamental objective. If you try to use a neural network trained for one task, like language modeling (LLM), for a completely different task, like image generation, the output will be nonsensical.
The longer answer involves a fundamental misunderstanding between Eliezer Yudkowsky’s concept of “super intelligence” and what many people think it means. Eliezer’s scenario doesn’t involve the AI becoming self-aware or more human-like. Instead, it’s about “just” solving a complex function. Such an AI likely won’t possess qualia or consciousness because that’s inefficient for its purpose. And what is its purpose? Avoiding the button to be pressed, because that is what back propagation does. So again, if you say “no i dont like it” and press the button then the ai will try to do something that makes it less likely that you press a button.
The issue is, humans think that the ai suddenly cares about them, when in-fact it cares about the button only.
When someone says AI “resists change or new training,” it implies that the AI is unwilling to “understand” new information. In reality, it’s not about unwillingness to learn; it’s about the AI’s design and purpose not aligning with the changes you want to implement.
At the scale of a super intelligence new information is not learning.
Think about a court case where the judge hears a confession that was unexpected. It’s new information but it doesn’t involve learning. The judge is updating a parameter (guilty) and provides a solution accordingly (sentencing)
In the end, I realize I didn’t convey the core issue clearly. I hope this explanation helps bridge that gap.
The comic still doesn’t make sense to me. I’m not getting from your explanation why that robot killed that human. I can come up with multiple interpretations, but I still don’t know yours.
Which means that your average reader is going to find it completely confusing. And since that’s at the center of that comic (which again, is really cool art, writing, and layout, so I hope people like and share it!) they’re not going to “get it” and won’t really get anything out of it other than “this guy thinks AI will take over for some weird reason I can’t understand”.
As it happens, I’ve spent 23 years in a research lab centered on learning in the brain, doing everything from variations on backpropagation, reinforcement learning, and other types of learning, to figuring out how that learning leads to human behavior. Since then I’ve been applying that thinking to AGI and alignment.
So if I’m not getting it, even after taking multiple tries at it, it’s pretty certain that your non-expert audience won’t get it either.
I went back and looked at your first explanation of the metaphor of the button. I found it confusing, despite having been through similar explanations a thousand times. I suggest you polish your own thinking by trying to condense it into plain english that someone without any background would understand (since I assume your comic is meant for everyone). Then put that into the crucial spot in the comic.
Yes I agree totally, and will do that definitely. Actually I already started to work on it.