This needs more work, my nitpick detector trips off on every other sentence. If you will be willing to heavily revise, I’ll compile a more detailed list (or revise some sections myself).
Examples (starting from the end, skipping some):
“but that stable eventual goal may be very difficult to predict in advance”—you don’t predict goals, you make them a certain way.
“We must also figure out how to build a general intelligence that satisfies a goal at all, and that stably retains that goal as it edits its own code to make itself smarter. This task is perhaps the primary difficulty in designing friendly AI.”—last sentence unwarranted.
“Eliezer Yudkowsky has proposed[57] Coherent Extrapolated Volition as a solution to two problems facing Friendly AI design:”—not just these two, bad wording
“We have already seen how simple rule-based and utilitarian designs for Friendly AI will fail. ”—should be a link/reference, a FAQ can be entered at any question.
“The second problem is that a superintelligence may generalize the wrong principles due to coincidental patterns in the training data”—a textbook error in machine learning methodology is a bad match for a fundamental problem, unless argued as being such in this particular case.
“But even if humans could be made to agree on all the training cases, two problems remain.”—just two? Bad wording.
“The first problem is that training on cases from our present reality may not result in a machine that will make correct ethical decisions in a world radically reshaped by superintelligence.”—the same can be said of humans (correctly, but as a result it doesn’t work as a simple distinguishing argument).
“Let’s consider the likely consequences of some utilitarian designs for Friendly AI.”—“utilitarian”: a potentially new term without any introduction, even with a link, is better to be avoided.
“An AI designed to minimize human suffering would simply kill all humans”—could/might would be better.
“caters to the complex and demanding wants of humanity”—this statement is repeated about 5 times in close forms, should change the wording somehow.
“by wiring humans into Nozick’s experience machines. ”—an even more opaque term without explanation.
“Either option would be easier for the AI to achieve than maintaining a utopian society catering to the complexity of human (and animal) desires.”—not actually clear (from my point of view, not simulated naive point of view). The notion of “default route” in foreign minds can be quite strange, and you don’t need much complexity in generating principle for a fractal to appear diverse. (There are clearly third alternatives that shelve both considered options, which also makes the comparison not terribly well-defined.)
“It’s not just a problem of specifying goals, either. It is hard to predict how goals will change in a self-modifying agent. No current mathematical decision theory can predict the decisions of a self-modifying agent.”—again, these things are there to be decided upon, not “predicted”
but that stable eventual goal may be very difficult to predict in advance
No, the point of that section is that there are many AI designs in which we can’t explicitly make goals.
This task is perhaps the primary difficulty in designing friendly AI.
Some at SIAI disagree. I’ve already qualified with ‘perhaps’.
not just these two, bad wording
Fixed.
should be a link/reference, a FAQ can be entered at any question
Alas, I think no such documents exist. But luckily, the sentence is unneeded.
a textbook error in machine learning methodology is a bad match for a fundamental problem, unless argued as being such in this particular case
I disagree. A textbook error in machine learning that has not yet been solved is good match for a fundamental problem.
just two? Bad wording.
Fixed.
the same can be said of humans (correctly, but as a result it doesn’t work as a simple distinguishing argument)
Again, I’m not claiming that these aren’t also problems elsewhere.
“utilitarian”: a potentially new term without any introduction, even with a link, is better to be avoided
Maybe. If you can come up with a concise way to get around it, I’m all ears.
could/might would be better
Agreed.
this statement is repeated about 5 times in close forms, should change the wording somehow
Why? I’ve already varied the wording, and the point of a FAQ with link anchors is that not everybody will read the whole FAQ from start to finish. I repeat the phrase ‘machine superintelligence’ in variations a lot, too.
an even more opaque term without explanation
Hence, the link, for people who don’t know.
not actually clear (from my point of view, not simulated naive point of view)
Changed to ‘might’.
again, these things are there to be decided upon, not “predicted”
Fixed.
Thanks for your comments. As you can see I am revising, so please do continue!
This needs more work, my nitpick detector trips off on every other sentence. If you will be willing to heavily revise, I’ll compile a more detailed list (or revise some sections myself).
Examples (starting from the end, skipping some):
“but that stable eventual goal may be very difficult to predict in advance”—you don’t predict goals, you make them a certain way.
“We must also figure out how to build a general intelligence that satisfies a goal at all, and that stably retains that goal as it edits its own code to make itself smarter. This task is perhaps the primary difficulty in designing friendly AI.”—last sentence unwarranted.
“Eliezer Yudkowsky has proposed[57] Coherent Extrapolated Volition as a solution to two problems facing Friendly AI design:”—not just these two, bad wording
“We have already seen how simple rule-based and utilitarian designs for Friendly AI will fail. ”—should be a link/reference, a FAQ can be entered at any question.
“The second problem is that a superintelligence may generalize the wrong principles due to coincidental patterns in the training data”—a textbook error in machine learning methodology is a bad match for a fundamental problem, unless argued as being such in this particular case.
“But even if humans could be made to agree on all the training cases, two problems remain.”—just two? Bad wording.
“The first problem is that training on cases from our present reality may not result in a machine that will make correct ethical decisions in a world radically reshaped by superintelligence.”—the same can be said of humans (correctly, but as a result it doesn’t work as a simple distinguishing argument).
“Let’s consider the likely consequences of some utilitarian designs for Friendly AI.”—“utilitarian”: a potentially new term without any introduction, even with a link, is better to be avoided.
“An AI designed to minimize human suffering would simply kill all humans”—could/might would be better.
“caters to the complex and demanding wants of humanity”—this statement is repeated about 5 times in close forms, should change the wording somehow.
“by wiring humans into Nozick’s experience machines. ”—an even more opaque term without explanation.
“Either option would be easier for the AI to achieve than maintaining a utopian society catering to the complexity of human (and animal) desires.”—not actually clear (from my point of view, not simulated naive point of view). The notion of “default route” in foreign minds can be quite strange, and you don’t need much complexity in generating principle for a fractal to appear diverse. (There are clearly third alternatives that shelve both considered options, which also makes the comparison not terribly well-defined.)
“It’s not just a problem of specifying goals, either. It is hard to predict how goals will change in a self-modifying agent. No current mathematical decision theory can predict the decisions of a self-modifying agent.”—again, these things are there to be decided upon, not “predicted”
etc.
No, the point of that section is that there are many AI designs in which we can’t explicitly make goals.
Some at SIAI disagree. I’ve already qualified with ‘perhaps’.
Fixed.
Alas, I think no such documents exist. But luckily, the sentence is unneeded.
I disagree. A textbook error in machine learning that has not yet been solved is good match for a fundamental problem.
Fixed.
Again, I’m not claiming that these aren’t also problems elsewhere.
Maybe. If you can come up with a concise way to get around it, I’m all ears.
Agreed.
Why? I’ve already varied the wording, and the point of a FAQ with link anchors is that not everybody will read the whole FAQ from start to finish. I repeat the phrase ‘machine superintelligence’ in variations a lot, too.
Hence, the link, for people who don’t know.
Changed to ‘might’.
Fixed.
Thanks for your comments. As you can see I am revising, so please do continue!
I know, but you use the word “predict”, which is what I was pointing out.
What do you mean, “has not yet been solved”? This kind of error is routinely being solved in practice, which is why it’s a textbook example.
Yes, but that makes it a bad illustration.
Because it’s bad prose, it sounds unnatural (YMMV).
This doesn’t address my argument. I know there is a link and I know that people could click on it, so that’s not what I meant.
(More later, maybe.)