I think my real objection is that MIRI kind of agrees with the idea “don’t attempt to make a pure utility maximizer with a static loss function on the first try” and thus has tried to build systems that aren’t pure utility maximizers, like ones that are instead corrigible or have “chill”. They just kinda don’t work so far and anybody suggesting that they haven’t looked is being a bit silly.
Instead, I wish someone suggesting this would actually concretely describe the properties they hope to gain by removing a value function, as I suspect the real answer is… corrigibility or chill. Saying “oh this pure utillity maximizer thing looks really hard let’s explore the space of all possible agent designs instead” isn’t really helpful—what are you looking to find and why is it safer?
I think my real objection is that MIRI kind of agrees with the idea “don’t attempt to make a pure utility maximizer with a static loss function on the first try” and thus has tried to build systems that aren’t pure utility maximizers, like ones that are instead corrigible or have “chill”. They just kinda don’t work so far and anybody suggesting that they haven’t looked is being a bit silly.
Instead, I wish someone suggesting this would actually concretely describe the properties they hope to gain by removing a value function, as I suspect the real answer is… corrigibility or chill. Saying “oh this pure utillity maximizer thing looks really hard let’s explore the space of all possible agent designs instead” isn’t really helpful—what are you looking to find and why is it safer?