Great post! I’m curious if you could elaborate on when you would feel comfortable making an agent to make some kind of “enlightened” decision, as opposed to one based more on “mere compliance”? Especially given an AI system that is perhaps not very interpretable, or operates on very high-stakes applications, what sort of certificate / guarantee / piece of reasoning would you want from a system to allow it to enact fundamental social changes? The nice thing about “mere compliance” is there are benchmarks for ‘right’ and ‘wrong’ decisions. But here I would expect people to reasonably disagree on whether an AI system or community of systems has made a good decision, and therefore it seems harder to ever fully trust machines to make decisions at this level.
Also, despite this being mentioned, it strikes me that purely appealing to what IS done, as opposed to what OUGHT to be done directly, in the context of building moral machines, can still ultimately be quite regressive without this “enlightened compliance” component. It even strikes me that the point of technological progress itself it so incrementally, and slightly, modify and upend the roles we play in the economy and society more broadly, and so strikes me as somewhat paradoxical to impose this ‘freezing’ criterion on tech and society in some way. Like we want AI to improve the conditions of society, but not fundamentally change its dynamics too much? I may be misunderstanding something about the argument here.
All of this is to say, it does feel somewhat unavoidable to me to advance some kind of claim about the precise constents of a superior moral framework for what systems ought to do, beyond just matching what people do (in Russell’s case) or what society does (in this post’s case). You mention the interaction of cooperating and working with machines in the “enlightened compliance” section, and not ever fully automating the decision. But what are deciding if not the contents of an eventually superior moral theory, or superior social one? This seems to me the inevitable, and ultimate, desideratum of social progress.
It reminds me of a lot of Chomskyian comments on law and justice: all norms and laws grope at an idealized theory of justice which is somehow biological and emergent from human nature. And while we certainly know that we ought not let system designers, corporations, or states just dictate the content of such a moral theory directly, I don’t think we can just purely lean on contractualism to avoid the question in the long run. Perhaps useful in the short run while we sort out the content of such a theory, but ultimately it seems we cannot avoid the question forever.
Idk, just my thinking, very thought-provoking post! Strong upvote, a conversation the community definitely ought to have.
But here I would expect people to reasonably disagree on whether an AI system or community of systems has made a good decision, and therefore it seems harder to ever fully trust machines to make decisions at this level.
I hope the above is at least partially addressed by the last paragraph of the section on Reverse Engineering Roles and Norms! I agree with the worry, and to address it I think we could design systems that mostly just propose revisions or extrapolations to our current rules, or highlight inconsistencies among them (e.g. conflicting laws), thereby aiding a collective-intelligence-like democratic process of updating our rules and norms (of the form described in the Collective Governance section), where AI systems facilitate but do not enact normative change.
Note that if AI systems represent uncertainty about the “correct” norms, this will often lead them to make queries to humans about how to extend/update the norms (a la active learning), instead of immediately acting under its best estimate of the extended norms. This could be further augmented by a meta-norm of (generally) requiring consent / approval from the relevant human decision-making body before revising or acting under new rules.
All of this is to say, it does feel somewhat unavoidable to me to advance some kind of claim about the precise constents of a superior moral framework for what systems ought to do, beyond just matching what people do (in Russell’s case) or what society does (in this post’s case).
I’m not suggesting that AI systems should simply do what society does! Rather, the point of the contractualist framing is that AI systems should be aligned (in the limit) to what society would agree to after rational / mutually justifiable collective deliberation.
Current democratic systems approximate this ideal to a very rough degree, and I guess I hold out hope that under the right kinds of epistemic and social conditions (freedom of expression, equality of interlocutors, non-deluded thinking), the kind of “moral progress” we instinctively view as desirable will emerge from that form of collective deliberation. So my hope is that rather than specify in great degree what the contents of “superior moral theory” might look like, all we need to align AI systems with is the underlying meta-ethical framework that enables moral change. See Anderson on How Common Sense Can Be Self-Critical for a good discussion of what I think this meta-ethical framework looks like.
Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:
I want to be clear that I do not mean AI systems should go off and philosophize on their own until they implement the perfect moral theory without human consent. Rather, our goal should be to design them in such a way that this will be a interactive, collaborative process, so that we continue to have autonomy over our civilizational future[10].
For both points here, I guess I was getting more at this question by asking these: how ought we structure this collaborative process? Like what constitutes feedback a machine sees to interactively improve with society? Who do AI interact with? What constitutes a datapoint in the moral learning process? These seem like loaded questions, and let me more concrete. In decisions without unanimity with regards to a moral fact, using simple majority rule, for example, could lead to disastrously bad moral theory: you could align an AI with norms resulting in of exploiting 40% of the public by 60% of the public (for example, if a majority deems it moral to exploit / under-provide for a minority, in an extreme case). It strikes me that to prevent this kind of failure mode, there must be some baked-in context of “obviously wrong” beforehand. If you require total unanimity, well then, you will never get even a single datapoint: people will reasonably disagree (I would argue to infinity, after arbitrary amounts of reasonable debate) about basic moral facts due to differences in values.
I think this negotiation process is in itself really really important to get right if you advocate this kind of approach, and not by advancing any one moral view of the world. I certainly don’t think it’s impossible, just as it isn’t impossible to have relatively well-functioning democracy. But this is the point I guess: are there limit guarantees to society agreeing after arbitrary lengths of deliberation? Has modern democracy / norm-setting historically risen from mutual deliberation, or from exertion of state power / arbitrary assertion of one norm over another? I honestly don’t have sufficient context to answer that, but it seems like relevant empirical fact here.
Maybe another follow up: what are your idealized conditions for “rational / mutually justifiable collective deliberation” here? It seems this phrase implicitly does a lot of heavy lifting for this framework, and I’m not quite sure myself what this would mean, even ideally.
Great post! I’m curious if you could elaborate on when you would feel comfortable making an agent to make some kind of “enlightened” decision, as opposed to one based more on “mere compliance”? Especially given an AI system that is perhaps not very interpretable, or operates on very high-stakes applications, what sort of certificate / guarantee / piece of reasoning would you want from a system to allow it to enact fundamental social changes? The nice thing about “mere compliance” is there are benchmarks for ‘right’ and ‘wrong’ decisions. But here I would expect people to reasonably disagree on whether an AI system or community of systems has made a good decision, and therefore it seems harder to ever fully trust machines to make decisions at this level.
Also, despite this being mentioned, it strikes me that purely appealing to what IS done, as opposed to what OUGHT to be done directly, in the context of building moral machines, can still ultimately be quite regressive without this “enlightened compliance” component. It even strikes me that the point of technological progress itself it so incrementally, and slightly, modify and upend the roles we play in the economy and society more broadly, and so strikes me as somewhat paradoxical to impose this ‘freezing’ criterion on tech and society in some way. Like we want AI to improve the conditions of society, but not fundamentally change its dynamics too much? I may be misunderstanding something about the argument here.
All of this is to say, it does feel somewhat unavoidable to me to advance some kind of claim about the precise constents of a superior moral framework for what systems ought to do, beyond just matching what people do (in Russell’s case) or what society does (in this post’s case). You mention the interaction of cooperating and working with machines in the “enlightened compliance” section, and not ever fully automating the decision. But what are deciding if not the contents of an eventually superior moral theory, or superior social one? This seems to me the inevitable, and ultimate, desideratum of social progress.
It reminds me of a lot of Chomskyian comments on law and justice: all norms and laws grope at an idealized theory of justice which is somehow biological and emergent from human nature. And while we certainly know that we ought not let system designers, corporations, or states just dictate the content of such a moral theory directly, I don’t think we can just purely lean on contractualism to avoid the question in the long run. Perhaps useful in the short run while we sort out the content of such a theory, but ultimately it seems we cannot avoid the question forever.
Idk, just my thinking, very thought-provoking post! Strong upvote, a conversation the community definitely ought to have.
I hope the above is at least partially addressed by the last paragraph of the section on Reverse Engineering Roles and Norms! I agree with the worry, and to address it I think we could design systems that mostly just propose revisions or extrapolations to our current rules, or highlight inconsistencies among them (e.g. conflicting laws), thereby aiding a collective-intelligence-like democratic process of updating our rules and norms (of the form described in the Collective Governance section), where AI systems facilitate but do not enact normative change.
Note that if AI systems represent uncertainty about the “correct” norms, this will often lead them to make queries to humans about how to extend/update the norms (a la active learning), instead of immediately acting under its best estimate of the extended norms. This could be further augmented by a meta-norm of (generally) requiring consent / approval from the relevant human decision-making body before revising or acting under new rules.
I’m not suggesting that AI systems should simply do what society does! Rather, the point of the contractualist framing is that AI systems should be aligned (in the limit) to what society would agree to after rational / mutually justifiable collective deliberation.
Current democratic systems approximate this ideal to a very rough degree, and I guess I hold out hope that under the right kinds of epistemic and social conditions (freedom of expression, equality of interlocutors, non-deluded thinking), the kind of “moral progress” we instinctively view as desirable will emerge from that form of collective deliberation. So my hope is that rather than specify in great degree what the contents of “superior moral theory” might look like, all we need to align AI systems with is the underlying meta-ethical framework that enables moral change. See Anderson on How Common Sense Can Be Self-Critical for a good discussion of what I think this meta-ethical framework looks like.
Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:
For both points here, I guess I was getting more at this question by asking these: how ought we structure this collaborative process? Like what constitutes feedback a machine sees to interactively improve with society? Who do AI interact with? What constitutes a datapoint in the moral learning process? These seem like loaded questions, and let me more concrete. In decisions without unanimity with regards to a moral fact, using simple majority rule, for example, could lead to disastrously bad moral theory: you could align an AI with norms resulting in of exploiting 40% of the public by 60% of the public (for example, if a majority deems it moral to exploit / under-provide for a minority, in an extreme case). It strikes me that to prevent this kind of failure mode, there must be some baked-in context of “obviously wrong” beforehand. If you require total unanimity, well then, you will never get even a single datapoint: people will reasonably disagree (I would argue to infinity, after arbitrary amounts of reasonable debate) about basic moral facts due to differences in values.
I think this negotiation process is in itself really really important to get right if you advocate this kind of approach, and not by advancing any one moral view of the world. I certainly don’t think it’s impossible, just as it isn’t impossible to have relatively well-functioning democracy. But this is the point I guess: are there limit guarantees to society agreeing after arbitrary lengths of deliberation? Has modern democracy / norm-setting historically risen from mutual deliberation, or from exertion of state power / arbitrary assertion of one norm over another? I honestly don’t have sufficient context to answer that, but it seems like relevant empirical fact here.
Maybe another follow up: what are your idealized conditions for “rational / mutually justifiable collective deliberation” here? It seems this phrase implicitly does a lot of heavy lifting for this framework, and I’m not quite sure myself what this would mean, even ideally.