Now this is admittedly very different from the thesis that value is complex and fragile.
I disagree. The fact that some concept is very complicated doesn’t mean it won’t be necessarily represented in any advanced AGI’s ontology. Humans’ psychology, or the specific tools necessary to build nanomachines, or the agent foundation theory necessary to design aligned successor agents, are all also “complex and fragile” concepts (in the sense that getting a small detail wrong would result in a grand failure of prediction/planning), but we can expect such concepts to be convergently learned.
Not that I necessarily expect “human values” specifically to actually be a natural abstraction — an indirect pointer at “moral philosophy”/DWIM/corrigibility seem much more plausible and much less complex.
I disagree. The fact that some concept is very complicated doesn’t mean it won’t be necessarily represented in any advanced AGI’s ontology. Humans’ psychology, or the specific tools necessary to build nanomachines, or the agent foundation theory necessary to design aligned successor agents, are all also “complex and fragile” concepts (in the sense that getting a small detail wrong would result in a grand failure of prediction/planning), but we can expect such concepts to be convergently learned.
Not that I necessarily expect “human values” specifically to actually be a natural abstraction — an indirect pointer at “moral philosophy”/DWIM/corrigibility seem much more plausible and much less complex.
Sorry for misrepresenting your views.