[Question] What if Ethics is Provably Self-Contradictory?

I’ve been thinking lately about the Repugnant Conclusion. For those who are not already aware, it’s a problem in Population Ethics where one is seemingly forced to say that a world entirely populated with happy, well-off people is less preferable (all else being equal) than a world consisting of a comparatively larger number of folk who experience a lower quality-of-life.

This doesn’t sound so bad at first (many philosophers would presumably be fine with reducing their quality of life on the condition that more babies with mild depression or something be bought into existence), until you realize that this can be applied iteratively. At some point, the larger (but less-well-off-per-individual) world is incredibly populous, but consists only of people who all have lives barely worth living. This world is “objectively” better than our first world, according to many formal ethical frameworks.[1]

Okay, but that isn’t so bad, is it? After all, “lives barely worth living” are still worth living! It’s not like we’re talking about a world full of suicidal people...right? Well, enter the so-called Very Repugnant Conclusion:

For any perfectly equal population with very high positive welfare, and for any number of lives with very negative welfare, there is a population consisting of the lives with negative welfare and lives with very low positive welfare which is better than the high welfare population, other things being equal.[2]

In other words, the Very Repugnant Conclusion considers a semi-hellish world. This world is populated by some people suffering so badly that they’d be better off not existing,[3] while the rest of the population has the same quality of life as the people from the end of the Repugnant Conclusion (i.e. only marginally worth living). Assuming a high enough population, this semi-hellish world is somehow better than one containing only extremely happy, well-off people.

The Very Repugnant Conclusion has been shown to be provably true if one accepts a very small set of basic moral/​logical axioms,[2] all of which seem intuitively, obviously true to many people. Therefore, if you want a self-consistent ethical framework, one must either “bite the bullet” on the Very Repugnant Conclusion and accept it as correct, or reject one of the axioms it rests on, all of which would seemingly have far-reaching consequences on other basic moral intuitions.[1][4]


This probably hasn’t convinced you that formal ethics is a contradictory illusion, if you didn’t already think so. After all, perhaps there’s some clever way around the Very Repugnant Conclusion we haven’t discovered yet, or perhaps you’re simply willing to just bite the bullet on it and say “yeah sure, maybe my moral intuition is flawed here.”[5]

More generally, it seems intuitively plausible that a formal system can (in theory) be devised[6] which, if followed, will always lead one to choose the “most ethical” option available, or at least to avoid choosing an “ethical atrocity.”[7] Consider that creating an AI which understands human ethics[8] seems at least theoretically doable. We also know that neural networks are, at the end of the day, mathematically equivalent to incredibly complex Turing machines, and aren’t our brains basically fancy neural nets as well? If AI can (presumably) do it, and brains can do it, what’s stopping philosophers from doing it, and writing down a self-consistent, intuitively moral ethics down? (Beyond the lack of paper and funding, of course...)


All sarcasm aside, I believe that the formal self-consistency—or lack thereof—of Ethics is quite possibly a fundamental problem for AI Alignment, among other fields. What would it mean for us if Ethics was fundamentally inconsistent?

Note that this is a very different question than asking if Ethics is “objective” or not; at this point it seems pretty obvious that our Ethics is in large part a product of human psychology and culture.[9] However, just because it’s a largely subjective “man-made” framework doesn’t mean it can’t have its own internally consistent logic.

Why is this distinction important? Well, if you’re trying to build an AI which is aligned with commonsense ethical values, I strongly suspect that the question of self-consistency will impact the sorts of challenges such a project will have to face.[10] I’m having trouble formulating exactly what the implications here are, which is why I’m turning to the community for help. I feel like there’s a really important insight somewhere around here, but it’s just out of reach...

  1. ^

    I’m not going to get into the details of the formal logic here, since I’m lazy and it isn’t necessary to understand my main point.

  2. ^
  3. ^

    By this I mean that it would be better had they never been brought into existence in the first place, not that they would (or necessarily should) choose to commit suicide once alive.

  4. ^

    The Repugnant Conclusion rests on fewer assumptions, but is more acceptable to many people, so makes for a less compelling case study.

  5. ^

    Or even “I don’t share the intuition that this is bad in the first place,” though I don’t know how many people would seriously say that about the Very Repugnant Conclusion.

  6. ^

    Keep in mind that such a system is allowed to be ad-hoc and super complex, as long as it’s self-consistent.

  7. ^

    From the perspective of at least one human possessing an ethical intuition. I’m leaving the quoted terms deliberately vague, so if you want to be pedantic here, mentally replace those quotes with whatever you think would make this post better.

  8. ^

    Even if it doesn’t follow said ethics itself; what matters is if it can consistently reason about it.

  9. ^
  10. ^

    Whereas the question of the “objectivity” of human ethics is basically the Orthogonality thesis debate, which is a different can of worms entirely.