Yes, robust cooperation is not much to us if its cooperation between the paperclip maximizer and the pencilhead minimizer. But if there are a hundred shards that make up human values, and tens of thousands of people running AI’s trying to maximize the values they see fit. It’s actually not unreasonable to assume that the outcome, while not exactly what we hoped for, is comparable to incomplete solutions that err on the side of (1) instead.
After having written this I notice that I’m confused and conflating: (a) incomplete solutions in the sense of there not being enough time to do what should be done, and (b) incomplete solutions in the sense of it being actually (provably?) impossible to implement what we right now consider essential parts of the solution. Has anyone got thoughts on (a) vs (b)?
If value alignment is sufficiently harder than general intelligence, then we should expect that given a large population of strong AIs created at roughly the same time, none of them should be remotely close to Friendly.
Not necessarily. In a multi-polar scenario consisting entirely of Unfriendly AIs, getting them to cooperate with each other doesn’t help us.
Yes, robust cooperation is not much to us if its cooperation between the paperclip maximizer and the pencilhead minimizer. But if there are a hundred shards that make up human values, and tens of thousands of people running AI’s trying to maximize the values they see fit. It’s actually not unreasonable to assume that the outcome, while not exactly what we hoped for, is comparable to incomplete solutions that err on the side of (1) instead.
After having written this I notice that I’m confused and conflating: (a) incomplete solutions in the sense of there not being enough time to do what should be done, and (b) incomplete solutions in the sense of it being actually (provably?) impossible to implement what we right now consider essential parts of the solution. Has anyone got thoughts on (a) vs (b)?
If value alignment is sufficiently harder than general intelligence, then we should expect that given a large population of strong AIs created at roughly the same time, none of them should be remotely close to Friendly.