There are several modes by which that could fail. For example, if the beings have simply mastered a classifier indistinguishable from a typical population member in polynomial time under an adaptive interactive proof protocol (similar to the so-called “Turing Test”), while actually implementing a (source-code-uninspectable) program hostile to that value system.
No way THAT could go wrong...
There are several modes by which that could fail. For example, if the beings have simply mastered a classifier indistinguishable from a typical population member in polynomial time under an adaptive interactive proof protocol (similar to the so-called “Turing Test”), while actually implementing a (source-code-uninspectable) program hostile to that value system.