I realize that perhaps what I am suggesting is a sort of Pascal’s Wager, albeit of potentially higher probability than the original.
…I might do a post on this if there’s interest, but I’m also noticing a similarity between this sort of thinking and classical theology, with people acting against what they would normally do because they believe in a Divine Punishment that awaits them if they disobey the wishes of a human-like entity. This suggests to me that if we can do the equivalent of getting an AGI to “believe” in God, then the rest of alignment is pretty easy. If you view God as a false belief, then perhaps we can simplify further to alignment being equivalent to solving Yudkowsky’s challenge of getting an AGI to “believe” that 1+1=3, with otherwise intelligent cognition.
I realize that perhaps what I am suggesting is a sort of Pascal’s Wager, albeit of potentially higher probability than the original.
…I might do a post on this if there’s interest, but I’m also noticing a similarity between this sort of thinking and classical theology, with people acting against what they would normally do because they believe in a Divine Punishment that awaits them if they disobey the wishes of a human-like entity. This suggests to me that if we can do the equivalent of getting an AGI to “believe” in God, then the rest of alignment is pretty easy. If you view God as a false belief, then perhaps we can simplify further to alignment being equivalent to solving Yudkowsky’s challenge of getting an AGI to “believe” that 1+1=3, with otherwise intelligent cognition.