But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function.
Which is to say, it won’t necessarily follow a goal correctly that is is capable of understanding correctly. On the other hand, it won’t necessarily fail to. Both possibilities are open.
Remember, the title of this argument is misleading:
It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.
Not all AI’s have goals, not all have goal stability, not all are incorrigible. Mindspace is big.
Which is to say, it won’t necessarily follow a goal correctly that is is capable of understanding correctly. On the other hand, it won’t necessarily fail to. Both possibilities are open.
Remember, the title of this argument is misleading:
https://www.lesswrong.com/posts/NyFuuKQ8uCEDtd2du/the-genie-knows-but-doesn-t-care
There’s no proof that the genie will not care.
Not all AI’s have goals, not all have goal stability, not all are incorrigible. Mindspace is big.