I have said it over and over. I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI.
The good guys do not directly impress their personal values onto a Friendly AI.
Actually setting up a Friendly AI’s values is an extremely meta operation, less “make the AI want to make people happy” and more like “superpose the possible reflective equilibria of the whole human species, and output new code that overwrites the current AI and has the most coherent support within that superposition”. This actually seems to be something of a Pons Asinorum in FAI—the ability to understand and endorse metaethical concepts that do not directly sound like amazing wonderful happy ideas. Describing this as declaring total war on the rest of humanity, does not seem fair (or accurate).
Eliezer wrote: