I agree with basically all of this; maybe I’m more pessimistic about tractability, but not enough to matter for any actual decision.
It sounds to me that given these beliefs the thing you would want to advocate is “let those who want to figure out a theory of human preferences do so and don’t shun them from AI safety”. Perhaps also “let’s have some introductory articles for such a theory so that new entrants to the field know that it is a problem that could use more work and can make an informed decision about what to work on”. Both of these I would certainly agree with.
In your original comment it sounded to me like you were advocating something stronger: that a theory of human preferences was necessary for AI safety, and (by implication) at least some of us who don’t work on it should switch to working on it. In addition, we should differentially encourage newer entrants to the field to work on a theory of human preferences, rather than some other problem of AI safety, so as to build a community around (4). I would disagree with these stronger claims.
Do you perhaps only endorse the first paragraph and not the second?
I endorse what you propose in the first paragraph. I do think a theory of human preferences is necessary and that at least someone should work on it (and if I didn’t think this I probably wouldn’t be doing it myself), although not necessarily that someone should switch to it all else equal, and I wouldn’t say we should encourage folks to work on it more than other problems as a general policy since there’s a lot to be done and I remain uncertain about prioritization so can’t make a strong recommendation there beyond “let’s make sure we don’t fail to work on as much as seems relevant as possible”.
So it sounds like we only disagree on the necessity aspect, and that seems to be the result of an inferential gap I’m not sure how to bridge yet, i.e. why it is I believe it to be necessary hinges in part on deeper beliefs we may not share and haven’t figured out to make explicit. That’s good to know, because it points towards something worth thinking about and addressing so that existing and new entrants to AI safety work may more accept it as important and useful work.
I agree with basically all of this; maybe I’m more pessimistic about tractability, but not enough to matter for any actual decision.
It sounds to me that given these beliefs the thing you would want to advocate is “let those who want to figure out a theory of human preferences do so and don’t shun them from AI safety”. Perhaps also “let’s have some introductory articles for such a theory so that new entrants to the field know that it is a problem that could use more work and can make an informed decision about what to work on”. Both of these I would certainly agree with.
In your original comment it sounded to me like you were advocating something stronger: that a theory of human preferences was necessary for AI safety, and (by implication) at least some of us who don’t work on it should switch to working on it. In addition, we should differentially encourage newer entrants to the field to work on a theory of human preferences, rather than some other problem of AI safety, so as to build a community around (4). I would disagree with these stronger claims.
Do you perhaps only endorse the first paragraph and not the second?
I endorse what you propose in the first paragraph. I do think a theory of human preferences is necessary and that at least someone should work on it (and if I didn’t think this I probably wouldn’t be doing it myself), although not necessarily that someone should switch to it all else equal, and I wouldn’t say we should encourage folks to work on it more than other problems as a general policy since there’s a lot to be done and I remain uncertain about prioritization so can’t make a strong recommendation there beyond “let’s make sure we don’t fail to work on as much as seems relevant as possible”.
So it sounds like we only disagree on the necessity aspect, and that seems to be the result of an inferential gap I’m not sure how to bridge yet, i.e. why it is I believe it to be necessary hinges in part on deeper beliefs we may not share and haven’t figured out to make explicit. That’s good to know, because it points towards something worth thinking about and addressing so that existing and new entrants to AI safety work may more accept it as important and useful work.