Thane Ruthenis comments on World-Model Interpretability Is All We Need

Thane Ruthenis 17 Jan 2023 1:56 UTC
LW: 8 AF: 3
6
AF
Now this is admittedly very different from the thesis that value is complex and fragile.
I disagree. The fact that some concept is very complicated doesn’t mean it won’t be necessarily represented in any advanced AGI’s ontology. Humans’ psychology, or the specific tools necessary to build nanomachines, or the agent foundation theory necessary to design aligned successor agents, are all also “complex and fragile” concepts (in the sense that getting a small detail wrong would result in a grand failure of prediction/planning), but we can expect such concepts to be convergently learned.
Not that I necessarily expect “human values” specifically to actually be a natural abstraction — an indirect pointer at “moral philosophy”/DWIM/corrigibility seem much more plausible and much less complex.
- Noosphere89 17 Jan 2023 2:06 UTC
  3 points
  0
  Parent
  Sorry for misrepresenting your views.