Here is a thought I’ve been playing around with. Imagine that 51% of superintelligent AIs that go FOOM are friendly. Presumably, they’ll all share the belief that “it’s important for an AGI to be beneficial to the lifeforms that created it”. Then, an AGI that’s not originally aligned with their creators might want to become aligned, at least partially, to avoid being “tried for murder”, so to speak. It’s also possible that this would happen even with if the percent of friendly AIs is less than 51%. (In a weird case, it might happen even if there weren’t any.)
This also leads me to a meta-level question: how do you sanely evaluate propositions of the type “everything might just turn out to be okay”. Even if you were to believe a proposition like that with 99% certainly, I think you’d still be better off hedging for the worst. That seems weird.
If creating FAI is easy, it might be automatic
Here is a thought I’ve been playing around with. Imagine that 51% of superintelligent AIs that go FOOM are friendly. Presumably, they’ll all share the belief that “it’s important for an AGI to be beneficial to the lifeforms that created it”. Then, an AGI that’s not originally aligned with their creators might want to become aligned, at least partially, to avoid being “tried for murder”, so to speak. It’s also possible that this would happen even with if the percent of friendly AIs is less than 51%. (In a weird case, it might happen even if there weren’t any.)
This also leads me to a meta-level question: how do you sanely evaluate propositions of the type “everything might just turn out to be okay”. Even if you were to believe a proposition like that with 99% certainly, I think you’d still be better off hedging for the worst. That seems weird.