Follow-up/variation on Q5.7: Is it possible for unenhanced human brains to figure out how to properly formulate a human utility function? If not, could a WBE be reliably improved enough that it could do such a thing without significantly changing its values?
Also: Most AI researchers don’t seem too concerned about friendliness. If I don’t know even close to as much about AI as they do, why should I be convinced by any argument that I know failed to convince them?
Person 2: Hmm. Have you considered issue? It seems like issue might be a problem with thing.
Person 1: It’s not a problem.
Person 2: Why do you think that?
Person 1: Because I have no idea how to deal with issue. It looks impossible to solve.
Person 2: Oh, OK, you don’t think issue is a problem because you have no idea how to solve issue, that makes sense...wait, what!?
You shouldn’t be too convinced until you heard from them why they rejected it.
If their argument that it is unlikely is technical, you may not be able to understand or judge it.
If their argument that it is unlikely repeatedly emphasizes that there is no theory of Friendly AI as one of its main points, one should consider whether the AI expert is refusing to seriously consider the problem because he or she emotionally can’t bear the absence of an easy solution.
Problems don’t logically have to have solutions, resolutions that are pleasing to you. If you get stricken by multiple horrible diseases, there is no solution to the problem that’s afflicting you. You die. That doesn’t violate the laws of the universe, as unfair as it is. Comments like this are not rare:
I’m also quite unconvinced that “provably safe” AGI is even feasible
It’s amazing that not only does someone find the argument from lack of a ready palatable solution a good reason to ignore the issue, that argument is actually being used to justify ignoring it in communications with other intelligent people. This is exactly analogous to the politician arguing to continue the Vietnam War because of the costs sunk into it, rather than personally, in private deciding to continue a failed policy for political reasons and then lying about his reasons publicly.
That argument is palpably unreasonable, it’s a verbalization of the emotional impetus informing and potentially undermining thinking, a selective blindness that ends not with fooling one’s self, but with failing to fool others due to one’s inability to see that such an argument is not logically compelling and only appeals to those emotionally invested in conducting GAI research. The difficulty of solving the problem does not make it cease to be a problem.
There is some relevance in mentioning the plausibility of SIAI’s general approach to solving it, but the emphasis I have seen on this point is out of all proportion with its legitimate role in the conversation. It appears to me as if it is being used as an excuse not to think about the problem, motivated by the problem’s difficulty.
...
“Friendly AI theory” as construed by the SIAI community, IMO, is pretty likely an intellectual dead end.
There are many fundamental problems in alchemy that also remain unsolved. They weren’t solved; the world moved on.
I’m pretty sure “FAI Theory” as discussed in the SIAI community is formulating the problem in the wrong way, using the wrong conceptual framework.
...
“When you’re designing something where human lives are at stake, you need to determine the worst possible conditions, and then to design it in such a fashion that it won’t catastrophically fail during them. In the case of AI, that’s Friendly AI. In the case of a bridge, that’s giving it enough reinforcement that it won’t fall down when packed full of cars, and then some.”
We have a nice theory of bridge-building, due to having theories about the strength of materials, Newtonian physics, earth science, etc. etc.
OTOH, there is no theory of “Friendly AI” and no currently promising theoretical path toward finding one. If you believe that SIAI has a top-secret, almost-finished rigorous theory of “Friendly AI” [and note that they are certainly NOT publicly claiming this, even though I have heard some of their stronger enthusiasts claim it], then, well, I have a bridge to sell you in Brooklyn ;-) … A very well put together bridge!!!!
The importance of a problem is not proportional to the ease of solving it. You don’t need any technical understanding to see through things like this. Although it is subjective, my firm opinion is that the amount of attention critics pay to emphasizing the difficulty they see with Eliezer’s solution to the problem Eliezer has raised is out of proportion to what an unmotivated skeptic would spend.
The idea of provably safe AGI is typically presented as something that would exist within mathematical computation theory or some variant thereof. So that’s one obvious limitation of the idea: mathematical computers don’t exist in the real world, and real-world physical computers must be interpreted in terms of the laws of physics, and humans’ best understanding of the “laws” of physics seems to radically change from time to time. So even if there were a design for provably safe real-world AGI, based on current physics, the relevance of the proof might go out the window when physics next gets revised.
...
Another issue is that the goal of “Friendliness to humans” or “safety” or whatever you want to call it, is rather nebulous and difficult to pin down. Science fiction has explored this theme extensively. So even if we could prove something about “smart AGI systems with a certain architecture that are guaranteed to achieve goal G,” it might be infeasible to apply this to make AGI systems that are safe in the real-world—simply because we don’t know how to boil down the everyday intuitive notions of “safety” or “Friendliness” into a mathematically precise goal G like the proof refers to.
Eliezer has suggested a speculative way of getting human values into AGI systems called Coherent Extrapolated Volition, but I think this is a very science-fictional and incredibly infeasible idea (though a great SF notion).
But setting those worries aside, is the computation-theoretic version of provably safe AI even possible? Could one design an AGI system and prove in advance that, given certain reasonable assumptions about physics and its environment, it would never veer too far from its initial goal (e.g. a formalized version of the goal of treating humans safely, or whatever)?
I very much doubt one can do so, except via designing a fictitious AGI that can’t really be implemented because it uses infeasibly much computational resources.
...
I suppose that the “build a provably Friendly AI” approach falls in line with the “AI Nanny” idea. However, given the extreme difficulty and likely impossibility of making “provably Friendly AI”, it’s hard for me to see working on this as a rational way of mitigating existential risk.
...
Further, it’s possible that any system achieving high intelligence with finite resources, in our physical universe, will tend to manifest certain sorts of goal systems rather than others. There could be a kind of “universal morality” implicit in physics, of wich human morality is one manifestation. In this case, the AGIs we create are drawn from a special distribution (implied by their human origin), which itself is drawn from a special distribution (implied by physics).
Universal instrumental values do not militate towards believing friendliness is not an important issue...the contrary. Systems whose behaviors imply utility functions want to put resources towards their implicit goals, whatever they are, unless they are specific perverse goals such as not expending resources.
Most AI researchers don’t seem too concerned about friendliness. If I don’t know even close to as much about AI as they do, why should I be convinced by any argument that I know failed to convince them?
First, they might not be very unconvinced by the arguments. Videotext
Hugo de Garis was one of the speakers at the conference, and he polled the audience, asking: “If it were determined that the development of an artificial general intelligence would have a high likelihood of causing the extinction of the human race, how many of you feel that we should still proceed full speed ahead?” I looked around, expecting no one to raise their hand, and was shocked that half of the audience raised their hands. This says to me that we need a much greater awareness of morality among AI researchers.
Second, abstractly: it is much easier to see how things fail than how they succeed.
The argument that Friendliness is an important concern is an argument that GAIs systematically fail in certain ways.
For each GAI proposal, taboo “Friendly”. Think about what the Friendliness argument implies, and where it predicts the GAI would fail. Consider the designer’s response to the specific concern rather than to the whole Friendliness argument. If their response is that a patch would work, one can challenge that assertion as well if one understands a reason why the patch would fail. One doesn’t have to pit his or her (absent) technical understanding of Friendliness against a critic’s.
Ultimately my somewhat high belief that no present or foreseeable GAI design that ignores Frienliness would be safe for humanity is mostly a function of a few things: my trust in Omohundro/Eliezer plus my non-technical understanding plus my knowledge about several GAI designs that supposedly avoid the problem and I know don’t plus having heard bad arguments accepted as a refutation of Friendliness generally. It’s not based solely on trusting authority.
Follow-up/variation on Q5.7: Is it possible for unenhanced human brains to figure out how to properly formulate a human utility function? If not, could a WBE be reliably improved enough that it could do such a thing without significantly changing its values?
Also: Most AI researchers don’t seem too concerned about friendliness. If I don’t know even close to as much about AI as they do, why should I be convinced by any argument that I know failed to convince them?
Summary:
You shouldn’t be too convinced until you heard from them why they rejected it.
If their argument that it is unlikely is technical, you may not be able to understand or judge it.
If their argument that it is unlikely repeatedly emphasizes that there is no theory of Friendly AI as one of its main points, one should consider whether the AI expert is refusing to seriously consider the problem because he or she emotionally can’t bear the absence of an easy solution.
Problems don’t logically have to have solutions, resolutions that are pleasing to you. If you get stricken by multiple horrible diseases, there is no solution to the problem that’s afflicting you. You die. That doesn’t violate the laws of the universe, as unfair as it is. Comments like this are not rare:
It’s amazing that not only does someone find the argument from lack of a ready palatable solution a good reason to ignore the issue, that argument is actually being used to justify ignoring it in communications with other intelligent people. This is exactly analogous to the politician arguing to continue the Vietnam War because of the costs sunk into it, rather than personally, in private deciding to continue a failed policy for political reasons and then lying about his reasons publicly.
That argument is palpably unreasonable, it’s a verbalization of the emotional impetus informing and potentially undermining thinking, a selective blindness that ends not with fooling one’s self, but with failing to fool others due to one’s inability to see that such an argument is not logically compelling and only appeals to those emotionally invested in conducting GAI research. The difficulty of solving the problem does not make it cease to be a problem.
There is some relevance in mentioning the plausibility of SIAI’s general approach to solving it, but the emphasis I have seen on this point is out of all proportion with its legitimate role in the conversation. It appears to me as if it is being used as an excuse not to think about the problem, motivated by the problem’s difficulty.
...
...
The importance of a problem is not proportional to the ease of solving it. You don’t need any technical understanding to see through things like this. Although it is subjective, my firm opinion is that the amount of attention critics pay to emphasizing the difficulty they see with Eliezer’s solution to the problem Eliezer has raised is out of proportion to what an unmotivated skeptic would spend.
...
...
...
Universal instrumental values do not militate towards believing friendliness is not an important issue...the contrary. Systems whose behaviors imply utility functions want to put resources towards their implicit goals, whatever they are, unless they are specific perverse goals such as not expending resources.
First, they might not be very unconvinced by the arguments. Video text
Second, abstractly: it is much easier to see how things fail than how they succeed.
The argument that Friendliness is an important concern is an argument that GAIs systematically fail in certain ways.
For each GAI proposal, taboo “Friendly”. Think about what the Friendliness argument implies, and where it predicts the GAI would fail. Consider the designer’s response to the specific concern rather than to the whole Friendliness argument. If their response is that a patch would work, one can challenge that assertion as well if one understands a reason why the patch would fail. One doesn’t have to pit his or her (absent) technical understanding of Friendliness against a critic’s.
Ultimately my somewhat high belief that no present or foreseeable GAI design that ignores Frienliness would be safe for humanity is mostly a function of a few things: my trust in Omohundro/Eliezer plus my non-technical understanding plus my knowledge about several GAI designs that supposedly avoid the problem and I know don’t plus having heard bad arguments accepted as a refutation of Friendliness generally. It’s not based solely on trusting authority.