Did you ever end up reading Reducing Goodhart? I enjoyed reading these thought experiments, but I think rather than focusing on “the right direction” (of wisdom), or “the right person,” we should mostly be thinking about “good processes”—processes for evolving humans’ values that humans themselves think are good, in the ordinary way we think ordinary good things are good.
Not yet, but I hope to, and I’m grateful to you for writing it.
processes for evolving humans’ values that humans themselves think are good, in the ordinary way we think ordinary good things are good
Well, sure, but the question is whether this can really be done by modelling human values and then evolving those models. If you claim yes then there are several thorny issues to contend with, including what constitutes a viable starting point for such a process, what is a reasonable dynamic for such a process, and on what basis we decide the answers to these things.
Did you ever end up reading Reducing Goodhart? I enjoyed reading these thought experiments, but I think rather than focusing on “the right direction” (of wisdom), or “the right person,” we should mostly be thinking about “good processes”—processes for evolving humans’ values that humans themselves think are good, in the ordinary way we think ordinary good things are good.
Not yet, but I hope to, and I’m grateful to you for writing it.
Well, sure, but the question is whether this can really be done by modelling human values and then evolving those models. If you claim yes then there are several thorny issues to contend with, including what constitutes a viable starting point for such a process, what is a reasonable dynamic for such a process, and on what basis we decide the answers to these things.