It’s an attractive idea for science fiction, but I think no matter how super-intelligent and unfriendly, an AI would be unable to produce some kind of mind-destroying grimoire. I just don’t think text and diagrams on a printed page, read slowly in the usual way, would have the bandwidth to rapidly and reliably “hack” any human. Needless to say, you would proceed with caution just in case.
I you don’t mind hurting a few volunteers to defend humanity from a much bigger threat, it should be fairly easy to detect, quarantine and possibly treat the damaged ones. They, after all, would only be ordinarily intelligent. Super-cunning existential-risk malevolence isn’t transitive.
I think a textbook full of sensory exploit hacks would be pretty valuable data in itself, but maybe I’m not a completely friendly natural intelligence ;-)
edit: Oh, I may have missed the point that of course you couldn’t trust the methods in the textbook for constructing FAI even if it itself posed no direct danger. Agreed.
no matter how super-intelligent and unfriendly, an AI would be unable to produce some kind of mind-destroying grimoire.
Consider that humans can and have made such grimoires; they call them bibles. All it takes is a nonrational but sufficiently appealing idea and an imperfect rationalist falls to it. If there’s a true hole in the textbook’s information, such that it produces unfriendly AI instead of friendly, and the AI who wrote the textbook handwaved that hole away, how confident are you that you would spot the best hand-waving ever written?
Not confident at all. In fact I have seen no evidence for the possibility, even in principle, of provably friendly AI. And if there were such evidence, I wouldn’t be able to understand it well enough to evaluate it.
In fact I wouldn’t trust such a textbook even written by human experts whose motives I trusted. The problem isn’t proving the theorems, it’s choosing the axioms.
It’s an attractive idea for science fiction, but I think no matter how super-intelligent and unfriendly, an AI would be unable to produce some kind of mind-destroying grimoire. I just don’t think text and diagrams on a printed page, read slowly in the usual way, would have the bandwidth to rapidly and reliably “hack” any human. Needless to say, you would proceed with caution just in case.
I you don’t mind hurting a few volunteers to defend humanity from a much bigger threat, it should be fairly easy to detect, quarantine and possibly treat the damaged ones. They, after all, would only be ordinarily intelligent. Super-cunning existential-risk malevolence isn’t transitive.
I think a textbook full of sensory exploit hacks would be pretty valuable data in itself, but maybe I’m not a completely friendly natural intelligence ;-)
edit: Oh, I may have missed the point that of course you couldn’t trust the methods in the textbook for constructing FAI even if it itself posed no direct danger. Agreed.
Consider that humans can and have made such grimoires; they call them bibles. All it takes is a nonrational but sufficiently appealing idea and an imperfect rationalist falls to it. If there’s a true hole in the textbook’s information, such that it produces unfriendly AI instead of friendly, and the AI who wrote the textbook handwaved that hole away, how confident are you that you would spot the best hand-waving ever written?
Not confident at all. In fact I have seen no evidence for the possibility, even in principle, of provably friendly AI. And if there were such evidence, I wouldn’t be able to understand it well enough to evaluate it.
In fact I wouldn’t trust such a textbook even written by human experts whose motives I trusted. The problem isn’t proving the theorems, it’s choosing the axioms.