I agree that it would be extremely difficult to find a world that, when completely and accurately described, would meet with effectively unconditional approval from both Rev. Dr. Martin Luther King, Jr. and a typical high-ranking member of the Ku Klux Klan.
Straight up impossible if their (apparent) values are still the same as before and they haven’t been mislead. If one agent prefers the absence of A to its presence, and another agent prefers the presence of A to its absence you cannot possibility satisfy both agents completely (without deliberately misleading at least one about A) . The solution can always be trivially improved for at least one agent by adding or removing A.
Actually, now that you invoke the unknowability of the far reaching capabilities of a superintelligence I thought of a very slight possibility of a word meeting your definition even though people have mutually contradictory values:
The world could be deliberately set up in a way that even a neutral third party description contained a fully general mind hack for human minds so that the AI could adjust the values of the hypothetical people tested trough the test. That’s almost certainly still impossible, but far more plausible than a word meeting the definition without any changing values, which would require all apparent value disagreements to be illusions and the world not to work in the way it appears to.
I think we can generalize that: Dissolving an apparent impossibility through the creative power of a super-intelligence should be far easier to do in an unfriendly way than doing the same in a friendly way, so a friendliness definition better had not contain any apparent impossibilities.
far more plausible than a word meeting the definition without any changing values,
I did not say or deliberately imply that nobody’s values would be changed by hearing an infallibly factual description of future events presented by a transcendant entity. In fact, that kind of experience is so powerful that unverified third-hand reports of it happening thousands of years ago retain enough impact to act as a recruiting tactic for several major religions.
which would require all apparent value disagreements to be illusions and the world not to work in the way it appears to.
Maybe not all, but certainly a lot of apparent value differences really are illusory. In third-world countries, genocide tends to flare up only after a drought leads to crop failures, suggesting that the real motivation is economic and racism is only used as an excuse, or a guide for who to kill without disrupting the social order more than absolutely necessary.
I think this is a lot less impossible than you’re trying to make it sound.
The stuff that people tend to get really passionate about, unwilling to compromise on, isn’t, in my experience, the global stuff. When someone says “I want less A” or “more A” they seem to mean “within range of my senses,” “in the environment where I’m likely to encounter it in the future” or “in my tribe’s territory or the territory of those we communicate with.” An arachnophobe wouldn’t panic upon hearing about a camel-spider three thousand miles away; if anything, the idea that none were on the same continent would be reassuring. An AI capable of terraforming galaxies might satisfy conflicting preferences by simply constructing an ideal environment for each, and somehow ensuring that everyone finds what they’re looking for.
The accurate description of such seemingly-impossible perfection would, in a sense, constitute a ‘fully general mind hack,’ in that it would convince anyone who can be convinced by the truth and satisfy anyone who can be satisfied within the laws of physics. If you know of a better standard, I’d like to hear it.
I’m not sure there is any point in continuing this. Once you allow the AI to optimize the human values it’s supposed to be tested against for test compability it’s over.
If, as you assert, pleasing everyone is impossible, and persuading anyone to accept something they wouldn’t otherwise be pleased by (even through a method as benign as giving them unlimited, factual knowledge of the consequences and allowing them to decide for themselves) is unFriendly, do you categorically reject the possibility of friendly AI?
If you think friendly AI is possible, but I’m going about it all wrong, what evidence would convince you that a given proposal was not equivalently flawed?
I’m not sure there is any point in continuing this.
I’m having some doubts, too. If you decide not to reply, I won’t press the issue.
and persuading anyone to accept something they wouldn’t otherwise be pleased by (even through a method as benign as giving them unlimited, factual knowledge of the consequences and allowing them to decide for themselves) is unFriendly,
No. Only if you allow acceptance to define friendliness. Leaving changing the definition of friendliness as an avenue to fulfill the goal defined as friendliness will almost certainly result in unfriendliness. Persuasion is not inherently unfriendly, provided it’s not used to short-circuit friendliness.
If you think friendly AI is possible, but I’m going about it all wrong, what evidence would convince you that a given proposal was not equivalently flawed?
As an absolute minimum it would need to be possible and not obviously exploitable. It should also not look like a hack. Ideally it should be understandable, give me an idea what an implementation might look like, be simple and elegant in design and seem rigorous enough to make me confident that the lack of visible holes is not a fact about the creativity in the looker.
Well, I’ll certainly concede that my suggestion fails the feasibility criterion, since a literal implementation might involve compiling a multiple-choice opinion poll with a countably infinite number of questions, translating it into every language and numbering system in history, and then presenting it to a number of subjects equal to the number of people who’ve ever lived multiplied by the average pre-singularity human lifespan in Planck-seconds multiplied by the number of possible orders in which those questions could be presented multiplied by the number of AI proposals under consideration.
I don’t mind. I was thinking about some more traditional flawed proposals, like the smile-maximizer, how they cast the net broadly enough to catch deeply Unfriendly outcomes, and decided to deliberately err in the other direction: design a test that would be too strict, that even a genuinely Friendly AI might not be able to pass, but that would definitely exclude any Unfriendly outcome.
Straight up impossible if their (apparent) values are still the same as before and they haven’t been mislead. If one agent prefers the absence of A to its presence, and another agent prefers the presence of A to its absence you cannot possibility satisfy both agents completely (without deliberately misleading at least one about A) . The solution can always be trivially improved for at least one agent by adding or removing A.
Actually, now that you invoke the unknowability of the far reaching capabilities of a superintelligence I thought of a very slight possibility of a word meeting your definition even though people have mutually contradictory values:
The world could be deliberately set up in a way that even a neutral third party description contained a fully general mind hack for human minds so that the AI could adjust the values of the hypothetical people tested trough the test. That’s almost certainly still impossible, but far more plausible than a word meeting the definition without any changing values, which would require all apparent value disagreements to be illusions and the world not to work in the way it appears to.
I think we can generalize that: Dissolving an apparent impossibility through the creative power of a super-intelligence should be far easier to do in an unfriendly way than doing the same in a friendly way, so a friendliness definition better had not contain any apparent impossibilities.
I did not say or deliberately imply that nobody’s values would be changed by hearing an infallibly factual description of future events presented by a transcendant entity. In fact, that kind of experience is so powerful that unverified third-hand reports of it happening thousands of years ago retain enough impact to act as a recruiting tactic for several major religions.
Maybe not all, but certainly a lot of apparent value differences really are illusory. In third-world countries, genocide tends to flare up only after a drought leads to crop failures, suggesting that the real motivation is economic and racism is only used as an excuse, or a guide for who to kill without disrupting the social order more than absolutely necessary.
I think this is a lot less impossible than you’re trying to make it sound.
The stuff that people tend to get really passionate about, unwilling to compromise on, isn’t, in my experience, the global stuff. When someone says “I want less A” or “more A” they seem to mean “within range of my senses,” “in the environment where I’m likely to encounter it in the future” or “in my tribe’s territory or the territory of those we communicate with.” An arachnophobe wouldn’t panic upon hearing about a camel-spider three thousand miles away; if anything, the idea that none were on the same continent would be reassuring. An AI capable of terraforming galaxies might satisfy conflicting preferences by simply constructing an ideal environment for each, and somehow ensuring that everyone finds what they’re looking for.
The accurate description of such seemingly-impossible perfection would, in a sense, constitute a ‘fully general mind hack,’ in that it would convince anyone who can be convinced by the truth and satisfy anyone who can be satisfied within the laws of physics. If you know of a better standard, I’d like to hear it.
I’m not sure there is any point in continuing this. Once you allow the AI to optimize the human values it’s supposed to be tested against for test compability it’s over.
If, as you assert, pleasing everyone is impossible, and persuading anyone to accept something they wouldn’t otherwise be pleased by (even through a method as benign as giving them unlimited, factual knowledge of the consequences and allowing them to decide for themselves) is unFriendly, do you categorically reject the possibility of friendly AI?
If you think friendly AI is possible, but I’m going about it all wrong, what evidence would convince you that a given proposal was not equivalently flawed?
I’m having some doubts, too. If you decide not to reply, I won’t press the issue.
No. Only if you allow acceptance to define friendliness. Leaving changing the definition of friendliness as an avenue to fulfill the goal defined as friendliness will almost certainly result in unfriendliness. Persuasion is not inherently unfriendly, provided it’s not used to short-circuit friendliness.
As an absolute minimum it would need to be possible and not obviously exploitable. It should also not look like a hack. Ideally it should be understandable, give me an idea what an implementation might look like, be simple and elegant in design and seem rigorous enough to make me confident that the lack of visible holes is not a fact about the creativity in the looker.
Well, I’ll certainly concede that my suggestion fails the feasibility criterion, since a literal implementation might involve compiling a multiple-choice opinion poll with a countably infinite number of questions, translating it into every language and numbering system in history, and then presenting it to a number of subjects equal to the number of people who’ve ever lived multiplied by the average pre-singularity human lifespan in Planck-seconds multiplied by the number of possible orders in which those questions could be presented multiplied by the number of AI proposals under consideration.
I don’t mind. I was thinking about some more traditional flawed proposals, like the smile-maximizer, how they cast the net broadly enough to catch deeply Unfriendly outcomes, and decided to deliberately err in the other direction: design a test that would be too strict, that even a genuinely Friendly AI might not be able to pass, but that would definitely exclude any Unfriendly outcome.
Please taboo the word ‘hack.’