I never came back to this paper after I briefly posted about it, and this seems as good a place as any to say more and continue the conversation.
What I found weird about this paper is that it seems to focus too much on something that seems largely irrelevant to me. I don’t expect there to be much for us to choose about how to aggregate values, because I expect that most of the problem is in figuring out how to specify or find values at all. I do expect there to be some issues to resolve around aggregation, but not knowing yet what we will be aggregating (that is, what the abstractions we will be trying to deal with aggregation over and conflict resolution of) makes it hard to see how this kind of consideration is yet relevant.
To be fair, many may object that I am making the same mistake worrying about understanding what values even are and how we might be able to verify if AI are aligned with ours when we don’t even know what AI powerful enough to need alignment will look like, so I wouldn’t want to see this kind of work not happen, only that for my taste it seems like a premature thing to worry about that may be reasoning about things that won’t be relevant or won’t be relevant in the way we expect such that the work is of limited marginal value.
That said, I think this paper stands as an excellent signal, as you do, that more mainstream AI researchers are taking problems in value alignment more seriously and thinking about problems of the kind that are more likely, in my estimation, to be important long term than short term concerns about, for example, narrow value learning.
It sure seems like if he really grokked the philosophical and technical challenge of getting a GAI agent to be net beneficial, he would write a different paper. That first challenge sort of overshadows the task of dividing up the post-singularity pie.
But I’m not sure whether the overshadowing is merely by being bigger (in which case this paper is still doing useful work), or if we should expect that solutions to the pie-dividing problems (e.g. weighing egalitarianism vs. utilitarianism) will necessarily fall out of the process that lets the AI learn how to behave well.
If you buy a pizza cutter, but the pizza doesn’t arrive, then you’ve wasted your money.
(Technically this is incorrect if you ever buy a pizza again, or there’s something else you can use it to split, but I understand the main reason people have expressed concern about AGI is the belief that if it goes horribly wrong there won’t be another chance to try again.)
I never came back to this paper after I briefly posted about it, and this seems as good a place as any to say more and continue the conversation.
What I found weird about this paper is that it seems to focus too much on something that seems largely irrelevant to me. I don’t expect there to be much for us to choose about how to aggregate values, because I expect that most of the problem is in figuring out how to specify or find values at all. I do expect there to be some issues to resolve around aggregation, but not knowing yet what we will be aggregating (that is, what the abstractions we will be trying to deal with aggregation over and conflict resolution of) makes it hard to see how this kind of consideration is yet relevant.
To be fair, many may object that I am making the same mistake worrying about understanding what values even are and how we might be able to verify if AI are aligned with ours when we don’t even know what AI powerful enough to need alignment will look like, so I wouldn’t want to see this kind of work not happen, only that for my taste it seems like a premature thing to worry about that may be reasoning about things that won’t be relevant or won’t be relevant in the way we expect such that the work is of limited marginal value.
That said, I think this paper stands as an excellent signal, as you do, that more mainstream AI researchers are taking problems in value alignment more seriously and thinking about problems of the kind that are more likely, in my estimation, to be important long term than short term concerns about, for example, narrow value learning.
It sure seems like if he really grokked the philosophical and technical challenge of getting a GAI agent to be net beneficial, he would write a different paper. That first challenge sort of overshadows the task of dividing up the post-singularity pie.
But I’m not sure whether the overshadowing is merely by being bigger (in which case this paper is still doing useful work), or if we should expect that solutions to the pie-dividing problems (e.g. weighing egalitarianism vs. utilitarianism) will necessarily fall out of the process that lets the AI learn how to behave well.
If you buy a pizza cutter, but the pizza doesn’t arrive, then you’ve wasted your money.
(Technically this is incorrect if you ever buy a pizza again, or there’s something else you can use it to split, but I understand the main reason people have expressed concern about AGI is the belief that if it goes horribly wrong there won’t be another chance to try again.)