I don’t think MIRI ever considered this an important part of the alignment problem, and I don’t think we expect humanity to solve lots of the alignment problem as a result of having such a tool
If you don’t think MIRI ever considered coming up with an “explicit function that reflects the human value function with high fidelity” to be “an important part of the alignment problem”, can you explain this passage from the Arbital page on The problem of fully updated deference?
One way to look at the central problem of value identification in superintelligence is that we’d ideally want some function that takes a complete but purely physical description of the universe, and spits out our true intended notion of value V in all its glory. Since superintelligences would probably be pretty darned good at collecting data and guessing the empirical state of the universe, this probably solves the whole problem.
This is not the same problem as writing down our true V by hand. The minimum algorithmic complexity of a meta-utility function ΔU which outputs V after updating on all available evidence, seems plausibly much lower than the minimum algorithmic complexity for writing V down directly. But as of 2017, nobody has yet floated any formal proposal for a ΔU of this sort which has not been immediately shot down.
Eliezer (who I assume is the author) appears to say in the first paragraph that solving the problem of value identification for superintelligences would “probably [solve] the whole problem”, and by “whole problem” I assume he’s probably referring to what he saw as an important part of the alignment problem (maybe not though?)
He referred to the problem of value identification as getting “some function that takes a complete but purely physical description of the universe, and spits out our true intended notion of value V in all its glory.” This seems to be very similar to my definition, albeit with the caveat that my definition isn’t about revealing “V in all its glory” but rather, is more about revealing V at the level that an ordinary human is capable of revealing V.
Unless the sole problem here is that we absolutely need our function that reveals V to be ~perfect, then I think this quote from the Arbital page directly supports my interpretation, and overall supports the thesis in my post pretty strongly (even if I’m wrong about a few minor details).
If you don’t think MIRI ever considered coming up with an “explicit function that reflects the human value function with high fidelity” to be “an important part of the alignment problem”, can you explain this passage from the Arbital page on The problem of fully updated deference?
Eliezer (who I assume is the author) appears to say in the first paragraph that solving the problem of value identification for superintelligences would “probably [solve] the whole problem”, and by “whole problem” I assume he’s probably referring to what he saw as an important part of the alignment problem (maybe not though?)
He referred to the problem of value identification as getting “some function that takes a complete but purely physical description of the universe, and spits out our true intended notion of value V in all its glory.” This seems to be very similar to my definition, albeit with the caveat that my definition isn’t about revealing “V in all its glory” but rather, is more about revealing V at the level that an ordinary human is capable of revealing V.
Unless the sole problem here is that we absolutely need our function that reveals V to be ~perfect, then I think this quote from the Arbital page directly supports my interpretation, and overall supports the thesis in my post pretty strongly (even if I’m wrong about a few minor details).