xuan comments on What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

xuan 13 Sep 2022 3:17 UTC
LW: 2 AF: 2
0
AF
But here I would expect people to reasonably disagree on whether an AI system or community of systems has made a good decision, and therefore it seems harder to ever fully trust machines to make decisions at this level.
I hope the above is at least partially addressed by the last paragraph of the section on Reverse Engineering Roles and Norms! I agree with the worry, and to address it I think we could design systems that mostly just propose revisions or extrapolations to our current rules, or highlight inconsistencies among them (e.g. conflicting laws), thereby aiding a collective-intelligence-like democratic process of updating our rules and norms (of the form described in the Collective Governance section), where AI systems facilitate but do not enact normative change.
Note that if AI systems represent uncertainty about the “correct” norms, this will often lead them to make queries to humans about how to extend/update the norms (a la active learning), instead of immediately acting under its best estimate of the extended norms. This could be further augmented by a meta-norm of (generally) requiring consent / approval from the relevant human decision-making body before revising or acting under new rules.
All of this is to say, it does feel somewhat unavoidable to me to advance some kind of claim about the precise constents of a superior moral framework for what systems ought to do, beyond just matching what people do (in Russell’s case) or what society does (in this post’s case).
I’m not suggesting that AI systems should simply do what society does! Rather, the point of the contractualist framing is that AI systems should be aligned (in the limit) to what society would agree to after rational / mutually justifiable collective deliberation.
Current democratic systems approximate this ideal to a very rough degree, and I guess I hold out hope that under the right kinds of epistemic and social conditions (freedom of expression, equality of interlocutors, non-deluded thinking), the kind of “moral progress” we instinctively view as desirable will emerge from that form of collective deliberation. So my hope is that rather than specify in great degree what the contents of “superior moral theory” might look like, all we need to align AI systems with is the underlying meta-ethical framework that enables moral change. See Anderson on How Common Sense Can Be Self-Critical for a good discussion of what I think this meta-ethical framework looks like.
- phillchris 13 Sep 2022 5:06 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:
  I want to be clear that I do not mean AI systems should go off and philosophize on their own until they implement the perfect moral theory without human consent. Rather, our goal should be to design them in such a way that this will be a interactive, collaborative process, so that we continue to have autonomy over our civilizational future^[10].
  For both points here, I guess I was getting more at this question by asking these: how ought we structure this collaborative process? Like what constitutes feedback a machine sees to interactively improve with society? Who do AI interact with? What constitutes a datapoint in the moral learning process? These seem like loaded questions, and let me more concrete. In decisions without unanimity with regards to a moral fact, using simple majority rule, for example, could lead to disastrously bad moral theory: you could align an AI with norms resulting in of exploiting 40% of the public by 60% of the public (for example, if a majority deems it moral to exploit / under-provide for a minority, in an extreme case). It strikes me that to prevent this kind of failure mode, there must be some baked-in context of “obviously wrong” beforehand. If you require total unanimity, well then, you will never get even a single datapoint: people will reasonably disagree (I would argue to infinity, after arbitrary amounts of reasonable debate) about basic moral facts due to differences in values.
  I think this negotiation process is in itself really really important to get right if you advocate this kind of approach, and not by advancing any one moral view of the world. I certainly don’t think it’s impossible, just as it isn’t impossible to have relatively well-functioning democracy. But this is the point I guess: are there limit guarantees to society agreeing after arbitrary lengths of deliberation? Has modern democracy / norm-setting historically risen from mutual deliberation, or from exertion of state power / arbitrary assertion of one norm over another? I honestly don’t have sufficient context to answer that, but it seems like relevant empirical fact here.
  Maybe another follow up: what are your idealized conditions for “rational / mutually justifiable collective deliberation” here? It seems this phrase implicitly does a lot of heavy lifting for this framework, and I’m not quite sure myself what this would mean, even ideally.