But the nature of concepts poses a challenge for this objective. There seems to be no obvious way of programming those highly complex goals into the AI right from the beginning
You don’t have to.
The idea is to give the AI a preference that causes it to want to do what [certain] humans would want to do, even though it doesn’t know what that will turn out to be.
The challenge is to give it enough information to unambiguously point it at those humans, so that it extrapolates our volitions, rather than, say, those of our genes (universe-tiled-with-your-DNA failure mode) or of our more subconscious processes. Key to this is getting it to identify a physical instantiation of an optimizing agent.
Key to this is getting it to identify a physical instantiation of an optimizing agent.
Here, have a functional upload, given as a lambda term. The main problem is what to do with it, not how to find one. Eventually we’ll have uploads, but it’s still far from being clear how to use them for defining preference. Recognizing a person without explicit uploading is a minor problem in comparison (though it’s necessary for aggregation from whole non-uploaded humanity).
You misunderstand the post. The problem is that the concepts themselves, which you need to use to express the goals, will change in meaning as the AI develops.
I didn’t notice this post at first, but it’s really good. Very important, a critical problem with the FAI plan.
You don’t have to.
The idea is to give the AI a preference that causes it to want to do what [certain] humans would want to do, even though it doesn’t know what that will turn out to be.
The challenge is to give it enough information to unambiguously point it at those humans, so that it extrapolates our volitions, rather than, say, those of our genes (universe-tiled-with-your-DNA failure mode) or of our more subconscious processes. Key to this is getting it to identify a physical instantiation of an optimizing agent.
Here, have a functional upload, given as a lambda term. The main problem is what to do with it, not how to find one. Eventually we’ll have uploads, but it’s still far from being clear how to use them for defining preference. Recognizing a person without explicit uploading is a minor problem in comparison (though it’s necessary for aggregation from whole non-uploaded humanity).
You misunderstand the post. The problem is that the concepts themselves, which you need to use to express the goals, will change in meaning as the AI develops.
I didn’t notice this post at first, but it’s really good. Very important, a critical problem with the FAI plan.