I’m not sure if I understand your point correctly. An AGI may be able to infer what we mean when we give it a goal, for instance from its understanding of the human psyche, its world model, and so on. But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function.
This is not about “genie-like misunderstandings”. It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.
To give an example, we know perfectly well that evolution gave us a sex drive because it “wanted” us to reproduce. But we don’t care and use contraception or watch porn instead of making babies.
But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function.
Which is to say, it won’t necessarily follow a goal correctly that is is capable of understanding correctly. On the other hand, it won’t necessarily fail to. Both possibilities are open.
Remember, the title of this argument is misleading:
It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.
Not all AI’s have goals, not all have goal stability, not all are incorrigible. Mindspace is big.
I’m not sure if I understand your point correctly. An AGI may be able to infer what we mean when we give it a goal, for instance from its understanding of the human psyche, its world model, and so on. But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function.
This is not about “genie-like misunderstandings”. It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.
To give an example, we know perfectly well that evolution gave us a sex drive because it “wanted” us to reproduce. But we don’t care and use contraception or watch porn instead of making babies.
Which is to say, it won’t necessarily follow a goal correctly that is is capable of understanding correctly. On the other hand, it won’t necessarily fail to. Both possibilities are open.
Remember, the title of this argument is misleading:
https://www.lesswrong.com/posts/NyFuuKQ8uCEDtd2du/the-genie-knows-but-doesn-t-care
There’s no proof that the genie will not care.
Not all AI’s have goals, not all have goal stability, not all are incorrigible. Mindspace is big.