Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:
I want to be clear that I do not mean AI systems should go off and philosophize on their own until they implement the perfect moral theory without human consent. Rather, our goal should be to design them in such a way that this will be a interactive, collaborative process, so that we continue to have autonomy over our civilizational future[10].
For both points here, I guess I was getting more at this question by asking these: how ought we structure this collaborative process? Like what constitutes feedback a machine sees to interactively improve with society? Who do AI interact with? What constitutes a datapoint in the moral learning process? These seem like loaded questions, and let me more concrete. In decisions without unanimity with regards to a moral fact, using simple majority rule, for example, could lead to disastrously bad moral theory: you could align an AI with norms resulting in of exploiting 40% of the public by 60% of the public (for example, if a majority deems it moral to exploit / under-provide for a minority, in an extreme case). It strikes me that to prevent this kind of failure mode, there must be some baked-in context of “obviously wrong” beforehand. If you require total unanimity, well then, you will never get even a single datapoint: people will reasonably disagree (I would argue to infinity, after arbitrary amounts of reasonable debate) about basic moral facts due to differences in values.
I think this negotiation process is in itself really really important to get right if you advocate this kind of approach, and not by advancing any one moral view of the world. I certainly don’t think it’s impossible, just as it isn’t impossible to have relatively well-functioning democracy. But this is the point I guess: are there limit guarantees to society agreeing after arbitrary lengths of deliberation? Has modern democracy / norm-setting historically risen from mutual deliberation, or from exertion of state power / arbitrary assertion of one norm over another? I honestly don’t have sufficient context to answer that, but it seems like relevant empirical fact here.
Maybe another follow up: what are your idealized conditions for “rational / mutually justifiable collective deliberation” here? It seems this phrase implicitly does a lot of heavy lifting for this framework, and I’m not quite sure myself what this would mean, even ideally.
Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:
For both points here, I guess I was getting more at this question by asking these: how ought we structure this collaborative process? Like what constitutes feedback a machine sees to interactively improve with society? Who do AI interact with? What constitutes a datapoint in the moral learning process? These seem like loaded questions, and let me more concrete. In decisions without unanimity with regards to a moral fact, using simple majority rule, for example, could lead to disastrously bad moral theory: you could align an AI with norms resulting in of exploiting 40% of the public by 60% of the public (for example, if a majority deems it moral to exploit / under-provide for a minority, in an extreme case). It strikes me that to prevent this kind of failure mode, there must be some baked-in context of “obviously wrong” beforehand. If you require total unanimity, well then, you will never get even a single datapoint: people will reasonably disagree (I would argue to infinity, after arbitrary amounts of reasonable debate) about basic moral facts due to differences in values.
I think this negotiation process is in itself really really important to get right if you advocate this kind of approach, and not by advancing any one moral view of the world. I certainly don’t think it’s impossible, just as it isn’t impossible to have relatively well-functioning democracy. But this is the point I guess: are there limit guarantees to society agreeing after arbitrary lengths of deliberation? Has modern democracy / norm-setting historically risen from mutual deliberation, or from exertion of state power / arbitrary assertion of one norm over another? I honestly don’t have sufficient context to answer that, but it seems like relevant empirical fact here.
Maybe another follow up: what are your idealized conditions for “rational / mutually justifiable collective deliberation” here? It seems this phrase implicitly does a lot of heavy lifting for this framework, and I’m not quite sure myself what this would mean, even ideally.