How do you know that a person is really friendly? You use methods that have worked in the past and look for manipulative techniques that misleadingly friendly people use to make you think they are friendly. We know that someone is friendly via the same methodology that we determine what it means to be friendly, subjective benefits (emotional support etc.) and goal assistance (helping you move when they could simply refuse to do so) without malicious motives that ultimately disservice you.
In the case of FAI we want more surety, and we can presumably get this via simulation and proofs of correctness. I would assume that even after we had a proof of correctness for a meta-ethical system we would want to run it through as many virtual scenarios as possible, since the human brain is simply not capable of the chains of reasoning within the meta-ethics that the machine would be, so we would want to introduce it to scenarios that are as complex as possible in order to determine that it fits our intuition of friendliness.
It seems to me that the bulk of the work is in the arena of identifying the most friendly meta-ethical architecture. The Lokhorst paper lukeprog posted a while ago clarified a few things for me, though I have no access to the cutting edge work on FAI (save for what leaks out into the blog posts), and judging by what Will Newsome has said in the past (cannot find the post) they have compiled a relatively large list of possibly relevant sub-problems that I would be very interested to see (even if many of them are likely to be time drains).
How do you know that a person is really friendly? You use methods that have worked in the past and look for manipulative techniques that misleadingly friendly people use to make you think they are friendly. We know that someone is friendly via the same methodology that we determine what it means to be friendly, subjective benefits (emotional support etc.) and goal assistance (helping you move when they could simply refuse to do so) without malicious motives that ultimately disservice you.
In the case of FAI we want more surety, and we can presumably get this via simulation and proofs of correctness. I would assume that even after we had a proof of correctness for a meta-ethical system we would want to run it through as many virtual scenarios as possible, since the human brain is simply not capable of the chains of reasoning within the meta-ethics that the machine would be, so we would want to introduce it to scenarios that are as complex as possible in order to determine that it fits our intuition of friendliness.
It seems to me that the bulk of the work is in the arena of identifying the most friendly meta-ethical architecture. The Lokhorst paper lukeprog posted a while ago clarified a few things for me, though I have no access to the cutting edge work on FAI (save for what leaks out into the blog posts), and judging by what Will Newsome has said in the past (cannot find the post) they have compiled a relatively large list of possibly relevant sub-problems that I would be very interested to see (even if many of them are likely to be time drains).