Aaron_Scher comments on Criticism of the main framework in AI alignment

Aaron_Scher 1 Feb 2023 22:37 UTC
3 points
0
7. Recall that alignment research motivated by the above points makes it easier to design AI that is controllable and whose goals are aligned with its designers’ goals.
As a consequence, bad actors might have an easier time using powerfull controllable AI to achieve their goals. (From 4 and 6)
8. Thus, even though AI alignment research improves the expected value of futures caused by uncontrolled AI, it reduces the expected value of futures caused by bad human actors using controlled AI to achieve their ends. (From 5 and 7)
This conclusion will seem more, or less, relevant depending on the beliefs you have about its different components.
It sounds to me like the claim you are making here is “the current AI Alignment paradigm might have a major hole, but also this hole might not be real”. But then the thrust of your post is something like “I am going to work on filling this hole”. You invoke epistemic and moral uncertainty in a somewhat handwavy way which leaves me skeptical. It’s not clear to me what you believe, so it is hard for me to productively disagree or provide useful feedback. Assuming you are going to spend many hours working on this research direction, I think it’s worth spending a few hours on determining if this proposed problem is in fact a problem, including making some personal guesses about the value of various futures (maybe you’ve already done this privately).
You later write:
I do not claim that research towards building the independent AI thinkers of 2.1 above is the most effective AI alignment research intervention, nor that it is the most effective intervention for moral progress. I’ve only presented a problem of the main framework in AI alignment, and proposed an alternative that aims to avoid that problem
To me, it’s not obvious that the thing you presented is actually a problem. My quick thoughts: extinction is quite bad, some types of galaxy-spanning civilizations are far worse than extinction, but many are better, including some of the ones I think would be created by current “bad actors”.
I’m furthermore unsure why the solution to this proposed problem is to try and design AIs to make moral progress; this seems possible but not obvious. One problem with bad actors is that they often don’t base their actions on what the philosophers think is good (e.g., dictators don’t seem concerned with this). On the other hand, perhaps the “bad actors” you are targeting are average Americans who eat a dozen farmed animals per year, and these are the values you are most worried about. Insofar as you want to avoid filling the universe with factory farming, you might want to investigate current approaches in Moral Progress or moral circle expansion; I suspect an AI approach to this problem won’t help much. There’s a similar story for impact here that looks like “get the Great Reflection started earlier” which I am more optimistic about, but I suspect to fail for other reasons. Not sure if this paragraph made sense; I’m gesturing at the fact that the “bad actors” you are targeting will affect what research directions to pursue, and for the main class of bad actors that comes to mind with that word, moral progress seems unlikely to help.
- Michele Campolo 6 Feb 2023 10:38 UTC
  1 point
  0
  Parent
  Sorry for the late reply, I missed your comment.
  It sounds to me like the claim you are making here is “the current AI Alignment paradigm might have a major hole, but also this hole might not be real”.
  I didn’t write something like that because it is not what I meant. I gave an argument whose strength depends on other beliefs one has, and I just wanted to stress this fact. I also gave two examples (reported below), so I don’t think I mentioned epistemic and moral uncertainty “in a somewhat handwavy way”.
  An example: if you think that futures shaped by malevolent actors using AI are many times more likely to happen than futures shaped by uncontrolled AI, the response will strike you as very important; and vice versa if you think the opposite.
  Another example: if you think that extinction is way worse than dystopic futures lasting a long time, the response won’t affect you much—assuming that bad human actors are not fans of complete extinction.
  Maybe your scepticism is about my beliefs, i.e. you are saying that it is not clear, from the post, what my beliefs on the matter are. I think presenting the argument is more important than presenting my own beliefs: the argument can be used, or at least taken into consideration, by anone who is interested in these topics, while my beliefs alone are useless if they are not backed up by evidence and/or arguments. In case you are curious: I do believe futures shaped by uncontrolled AI are unlikely to happen.
  Now to the last part of your comment:
  I’m furthermore unsure why the solution to this proposed problem is to try and design AIs to make moral progress; this seems possible but not obvious. One problem with bad actors is that they often don’t base their actions on what the philosophers think is good
  I agree that bad actors won’t care. Actually, I think that even if we do manage to build some kind of AI that is considered superethical (better than humans at ethical reasoning) by a decent amount of philosophers, very few people will care, especially at the beginning. But that doesn’t mean it will be useless: at some point in the past, very few people believed slavery was bad, now it is a common belief. How much will such an AI accelerate moral progress, compared to other approaches? Hard to tell, but I wouldn’t throw the idea in the bin.