On the other hand, a machine could fail a morality test simply by saying something controversial, or just failing to signal properly. For example atheism could be considered immoral by religious people; they could conclude that the machine is missing a part of human utility function. Or if some nice and correct belief has bad consequences, but humans compartmentalize it away and the machine would point it out explicitly, that could be percieved as a moral failure.
If the machine is allowed to lie, passing this test could just mean the machine is a skilled psychopath. If the machine is not allowed to lie, failing this test could just mean humans confuse signalling with the real thing.
On the other hand, a machine could fail a morality test simply by saying something controversial, or just failing to signal properly. For example atheism could be considered immoral by religious people; they could conclude that the machine is missing a part of human utility function. Or if some nice and correct belief has bad consequences, but humans compartmentalize it away and the machine would point it out explicitly, that could be percieved as a moral failure.
If the machine is allowed to lie, passing this test could just mean the machine is a skilled psychopath. If the machine is not allowed to lie, failing this test could just mean humans confuse signalling with the real thing.