Which approach gives a higher expected value? Formal specification is compatible with Eliezer’s ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
“Tell the AI in English” can fail, but the worst case is closer to a “With Folded Hands” scenario than to paperclips.
I don’t think that’s how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.
Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don’t believe his position says that a near-Friendly-Complete specification is very risky—after all, a “smart” AGI would know what you really meant—but that such a specification would be superfluous.
Whether Mr. Loosemore is correct isn’t cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels.
I don’t think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn’t been either.
Both are addressing intentionally friendly or neutral AI that goes wrong.
I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
I think it’s a question of what you program in, and what you let it figure out for itself. If you want to prove formally that it will behave in certain ways, you would like to program in explicitly, formally, what its goals mean. But I think that “human pleasure” is such a complicated idea that trying to program it in formally is asking for disaster. That’s one of the things that you should definitely let the AI figure out for itself. Richard is saying that an AI as smart as a smart person would never conclude that human pleasure equals brain dopamine levels.
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
I think that “human pleasure” is such a complicated idea that trying to program it in formally is asking for disaster. That’s one of the things that you should definitely let the AI figure out for itself.
[...]
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
I don’t think that’s how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.
Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don’t believe his position says that a near-Friendly-Complete specification is very risky—after all, a “smart” AGI would know what you really meant—but that such a specification would be superfluous.
Whether Mr. Loosemore is correct isn’t cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.
The problem is that I don’t see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of “human pleasure = brain dopamine levels”, not least of all because there are people who’d want to be wireheads and there’s a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
I don’t think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn’t been either. Both are addressing intentionally friendly or neutral AI that goes wrong.
Wouldn’t it care about getting things right?
I think it’s a question of what you program in, and what you let it figure out for itself. If you want to prove formally that it will behave in certain ways, you would like to program in explicitly, formally, what its goals mean. But I think that “human pleasure” is such a complicated idea that trying to program it in formally is asking for disaster. That’s one of the things that you should definitely let the AI figure out for itself. Richard is saying that an AI as smart as a smart person would never conclude that human pleasure equals brain dopamine levels.
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
Huh I thought he wanted to use CEV?
You are right. I think PhilGoetz must be confused. EY has at least certainly never suggested programming an AI to maximise human pleasure.