and persuading anyone to accept something they wouldn’t otherwise be pleased by (even through a method as benign as giving them unlimited, factual knowledge of the consequences and allowing them to decide for themselves) is unFriendly,
No. Only if you allow acceptance to define friendliness. Leaving changing the definition of friendliness as an avenue to fulfill the goal defined as friendliness will almost certainly result in unfriendliness. Persuasion is not inherently unfriendly, provided it’s not used to short-circuit friendliness.
If you think friendly AI is possible, but I’m going about it all wrong, what evidence would convince you that a given proposal was not equivalently flawed?
As an absolute minimum it would need to be possible and not obviously exploitable. It should also not look like a hack. Ideally it should be understandable, give me an idea what an implementation might look like, be simple and elegant in design and seem rigorous enough to make me confident that the lack of visible holes is not a fact about the creativity in the looker.
Well, I’ll certainly concede that my suggestion fails the feasibility criterion, since a literal implementation might involve compiling a multiple-choice opinion poll with a countably infinite number of questions, translating it into every language and numbering system in history, and then presenting it to a number of subjects equal to the number of people who’ve ever lived multiplied by the average pre-singularity human lifespan in Planck-seconds multiplied by the number of possible orders in which those questions could be presented multiplied by the number of AI proposals under consideration.
I don’t mind. I was thinking about some more traditional flawed proposals, like the smile-maximizer, how they cast the net broadly enough to catch deeply Unfriendly outcomes, and decided to deliberately err in the other direction: design a test that would be too strict, that even a genuinely Friendly AI might not be able to pass, but that would definitely exclude any Unfriendly outcome.
No. Only if you allow acceptance to define friendliness. Leaving changing the definition of friendliness as an avenue to fulfill the goal defined as friendliness will almost certainly result in unfriendliness. Persuasion is not inherently unfriendly, provided it’s not used to short-circuit friendliness.
As an absolute minimum it would need to be possible and not obviously exploitable. It should also not look like a hack. Ideally it should be understandable, give me an idea what an implementation might look like, be simple and elegant in design and seem rigorous enough to make me confident that the lack of visible holes is not a fact about the creativity in the looker.
Well, I’ll certainly concede that my suggestion fails the feasibility criterion, since a literal implementation might involve compiling a multiple-choice opinion poll with a countably infinite number of questions, translating it into every language and numbering system in history, and then presenting it to a number of subjects equal to the number of people who’ve ever lived multiplied by the average pre-singularity human lifespan in Planck-seconds multiplied by the number of possible orders in which those questions could be presented multiplied by the number of AI proposals under consideration.
I don’t mind. I was thinking about some more traditional flawed proposals, like the smile-maximizer, how they cast the net broadly enough to catch deeply Unfriendly outcomes, and decided to deliberately err in the other direction: design a test that would be too strict, that even a genuinely Friendly AI might not be able to pass, but that would definitely exclude any Unfriendly outcome.
Please taboo the word ‘hack.’