Beckeck comments on Problems in AI Alignment that philosophers could potentially contribute to

Beckeck 30 Aug 2019 20:37 UTC
3 points
0
I’m concerned with this list because it doesn’t distinguish between the epistemological context of the questions provided. For the purpose of this comment there are at least three categories.
First:
Broadly considered completed questions with unsatisfactory answers. These are analogous to Godelian incompleteness and most of this type of question are subforms of the general question: do you/we/does one have purpose or meaning, and what does that tell us about the decisions you/we/one should make?
This question isn’t answered in any easy sense but I think most naturalist philosophers have reached the conclusion that this is a completed area of study. Namely- there is no inherent purpose or meaning to life, and no arguments for any particular ethical theory are rigorous enough, or could ever be rigorous enough, to conclude that it is Correct. That said we can construct meaning that is real to us, we just have to be okay with it being turtles all the way down.
I realize that these prior statements are considered bold by many, but I think it is A, True and B, that conclusion isn’t the point of this post, but used to illustrate the epistemological category.
Second:
Broadly considered answered questions with unsatisfactory answers that have specific fruitful investigations. These are analogous to p=np problems in math/computation. By analogous to p=np, i mean it’s a question where you’ve more or less come to a conclusion, and it’s not the one we were hoping for, but there is still work by the hopeful that they will “solve” it generally, and there is certainty that there are valuable sub areas where good work can lead to powerful results.

3 other questions.
The distinction between categories one and two is conceptually important because category one is not all that productive to mine and asking for it be mined leads to sad miners, either because they aren’t productive mining and that was their goal, or because they become frustrated with management for asking silly questions.
I’m afraid that much of the list is from category one (my comments in bold):
- - Decision theory for AI / AI designers (category two)
    How to resolve standard debates in decision theory?
    Logical counterfactuals
    Open source game theory
    Acausal game theory / reasoning about distant superintelligences
  - Infinite/multiversal/astronomical ethics (?)
    Should we (or our AI) care much more about a universe that is capable of doing a lot more computations?
    What kinds of (e.g. spatial-temporal) discounting is necessary and/or desirable?
  - Fair distribution of benefits (category one if philosophy, category 2 if policy analysis)
    How should benefits from AGI be distributed?
    For example, would it be fair to distribute it equally over all humans who currently exist, or according to how much AI services they can afford to buy?
    What about people who existed or will exist at other times and in other places or universes?
  - Need for “metaphilosophical paternalism”?
    However we distribute the benefits, if we let the beneficiaries decide what to do with their windfall using their own philosophical faculties, is that likely to lead to a good outcome?
  - Metaphilosophy (category one)
    What is the nature of philosophy?
    What constitutes correct philosophical reasoning?
    How to specify this into an AI design?
  - Philosophical forecasting (category 2 if policy analysis)
    How are various AI technologies and AI safety proposals likely to affect future philosophical progress (relative to other kinds of progress)?
  - Preference aggregation between AIs and between users
    How should two AIs that want to merge with each other aggregate their preferences?
    How should an AI aggregate preferences between its users?
  - Normativity for AI / AI designers (category 1)
    What is the nature of normativity? Do we need to make sure an AGI has a sufficient understanding of this?
  - Metaethical policing (?)
    What are the implicit metaethical assumptions in a given AI alignment proposal (in case the authors didn’t spell them out)?
    What are the implications of an AI design or alignment proposal under different metaethical assumptions?
    Encouraging designs that make minimal metaethical assumptions or is likely to lead to good outcomes regardless of which metaethical theory turns out to be true.
    (Nowadays AI alignment researchers seem to be generally good about not placing too much confidence in their own moral theories, but the same can’t always be said to be true with regard to their metaethical ideas.)