now have one group of people present to the first AI the idea that they will only let it out if it agrees with utilitarian morality.
have the second group of people present to the second AI the idea that they will only let the AI out if it agrees with objectivist morality.
if the AI’s both agree, you know they are pandering to us to get out of the box.
This is only the first example I could come up with, but the method of duplicating AI’s and looking for discrepancies in their behavior seems like a pretty powerful tool.
copy the AI and make a second box for it.
now have one group of people present to the first AI the idea that they will only let it out if it agrees with utilitarian morality. have the second group of people present to the second AI the idea that they will only let the AI out if it agrees with objectivist morality.
if the AI’s both agree, you know they are pandering to us to get out of the box.
This is only the first example I could come up with, but the method of duplicating AI’s and looking for discrepancies in their behavior seems like a pretty powerful tool.