Value extrapolation is thus necessary for AI alignment. It is also almostsufficient, since it allows AIs to draw correct conclusions from imperfectly defined human data.
I am missing something… The idea of correctly extrapolating human values is basically the definition of the Eliezer’s original proposal, CEV. In fact, it’s right there in the name. What is the progress over the last decade?
CEV is based on extrapolating the person; the values are what the person would have had, had they been smarter, known more, had more self-control, etc… Once you have defined the idealised person, the values emerge as a consequence. I’ve criticised this idea in the past, mainly because the process to generate the idealised person seems vulnerable to negative attractors (Eliezer’s most recent version of CEV has less of this problem).
Value extrapolation and model splintering are based on extrapolating features and concepts in models, to other models. This can be done without knowing human psychology or (initially) anything about knowing anything about humans at all, including their existence. See for example the value extrapolation partially resolves symbol grounding post; I would never write “CEV partially resolves symbol grounding”. On the contrary, CEV needs symbol grounding.
I don’t really understand the symbol grounding issue, but I can see that “value extrapolation” just happened to sound very similar to CEV and hence my confusion.
That acronym stands for “Coherent Extrapolated Volition” not “Coherent Extrapolated Values”. But from skimming the paper just now, I think agree with shminux that it’s basically the same idea.
I am missing something… The idea of correctly extrapolating human values is basically the definition of the Eliezer’s original proposal, CEV. In fact, it’s right there in the name. What is the progress over the last decade?
CEV is based on extrapolating the person; the values are what the person would have had, had they been smarter, known more, had more self-control, etc… Once you have defined the idealised person, the values emerge as a consequence. I’ve criticised this idea in the past, mainly because the process to generate the idealised person seems vulnerable to negative attractors (Eliezer’s most recent version of CEV has less of this problem).
Value extrapolation and model splintering are based on extrapolating features and concepts in models, to other models. This can be done without knowing human psychology or (initially) anything about knowing anything about humans at all, including their existence. See for example the value extrapolation partially resolves symbol grounding post; I would never write “CEV partially resolves symbol grounding”. On the contrary, CEV needs symbol grounding.
I don’t really understand the symbol grounding issue, but I can see that “value extrapolation” just happened to sound very similar to CEV and hence my confusion.
I wanted to look up CEV after reading this comment. Here’s a link for anyone else looking: https://intelligence.org/files/CEV.pdf
That acronym stands for “Coherent Extrapolated Volition” not “Coherent Extrapolated Values”. But from skimming the paper just now, I think agree with shminux that it’s basically the same idea.
A more recent explanation of CEV by Eliezer: https://arbital.com/p/cev/