no matter how super-intelligent and unfriendly, an AI would be unable to produce some kind of mind-destroying grimoire.
Consider that humans can and have made such grimoires; they call them bibles. All it takes is a nonrational but sufficiently appealing idea and an imperfect rationalist falls to it. If there’s a true hole in the textbook’s information, such that it produces unfriendly AI instead of friendly, and the AI who wrote the textbook handwaved that hole away, how confident are you that you would spot the best hand-waving ever written?
Not confident at all. In fact I have seen no evidence for the possibility, even in principle, of provably friendly AI. And if there were such evidence, I wouldn’t be able to understand it well enough to evaluate it.
In fact I wouldn’t trust such a textbook even written by human experts whose motives I trusted. The problem isn’t proving the theorems, it’s choosing the axioms.
Consider that humans can and have made such grimoires; they call them bibles. All it takes is a nonrational but sufficiently appealing idea and an imperfect rationalist falls to it. If there’s a true hole in the textbook’s information, such that it produces unfriendly AI instead of friendly, and the AI who wrote the textbook handwaved that hole away, how confident are you that you would spot the best hand-waving ever written?
Not confident at all. In fact I have seen no evidence for the possibility, even in principle, of provably friendly AI. And if there were such evidence, I wouldn’t be able to understand it well enough to evaluate it.
In fact I wouldn’t trust such a textbook even written by human experts whose motives I trusted. The problem isn’t proving the theorems, it’s choosing the axioms.