It’s not at all obvious to me that Shapiro’s heuristics are bad, but I feel comfortable asserting that they’re thoroughly insufficient. They’re a reasonable starting point for present-day AI, I think, and seem like good candidates for inclusion in a constitutional AI. but adversarial examples—holes in the behavior manifold—make it unclear whether even an entirely correct english description of human values would currently produce acceptable AI behavior in all edge cases.
Some specific tags on these topics:
https://www.lesswrong.com/tag/instrumental-convergence
https://www.lesswrong.com/tag/utility-functions
https://www.lesswrong.com/tag/orthogonality-thesis
https://www.lesswrong.com/tag/adversarial-examples
It’s not at all obvious to me that Shapiro’s heuristics are bad, but I feel comfortable asserting that they’re thoroughly insufficient. They’re a reasonable starting point for present-day AI, I think, and seem like good candidates for inclusion in a constitutional AI. but adversarial examples—holes in the behavior manifold—make it unclear whether even an entirely correct english description of human values would currently produce acceptable AI behavior in all edge cases.