TAG comments on Musings on general systems alignment

TAG 30 Jun 2021 19:19 UTC
3 points

Yet there is a single unifying issue to resolve here, which is this: how do we build things in the world that are and remain consistently beneficial to all life

I thought one of the motivations for talking about friendliness, as opposed to objective goodness is that objective goodness might not be good for us...for instance , an ASI might side with extreme environmentalism and decide that humans need to be greatly reduced in number to give other species a break.
- Alex Flint 30 Jun 2021 21:25 UTC
  5 points
  Parent
  Yes, that is an incredibly important issue in my view. I would consider the construction of an AI that took a view of extreme environmentalism and went on to kill large numbers of humans a terrible error. In fact I would consider the construction of an AI that would take any particular operationalization of some “objective good” through to the end of the world would be a very big error, since it seems to me that any particular operationalization of “good” leads, eventually, to something that is very obviously not good. You can go case-by-case and kind of see that each possible operationalization of “good” misses the mark pretty catastrophically, and then after a while you stop trying.
  
  Yet we have to build things in the world somehow, and anything we build is going to operationalize its goals somehow, so how can possibly proceed? This is why I think this issue deserves the mantle of “the hard problem of alignment”.
  
  It doesn’t necessarily help to replace “goodness” with “friendliness”, although I do agree that “friendliness” seems like a better pointer towards the impossibly simple kind of benevolence that we seek to create.
  
  A second point I think is underlying your comment (correct me if I’m wrong) is that perhaps there is some objective good, but that it isn’t good for us (e.g. extreme environmentalism). I think this is a very reasonable concern if we imagine that there might be some particular operationalization of objective goodness that is the one-and-only final operationalization of objective goodness. If we imagine that such an operationalization might one day be discovered by us or by an AI, then yes, it’s well worth asking whether this operationalization if in fact good for us. But luckily I don’t think any such final operationalization of objective goodness exists. There just is no such thing, in my view.
  
  Our task, then, in my view, is to make sure we don’t build powerful systems that behave as though there is some final operationalization of objective goodness. Yet it seems that any tangible system whatsoever is going to behave according to some kind of operationalization of terminal goals implicit in its design. So if these two claims are both true then how the heck do we proceed? This is again what I am calling the “hard problem of alignment”.