I’ve been under the impression that “Friendliness proofs” aren’t about proving Friendliness as such. Rather, they’re proofs that whatever is set as the AI’s goal function will always be preserved by the AI as its goal function, no matter how much self-improvement it goes through.
I’ve been under the impression that “Friendliness proofs” aren’t about proving Friendliness as such. Rather, they’re proofs that whatever is set as the AI’s goal function will always be preserved by the AI as its goal function, no matter how much self-improvement it goes through.