Wei Dai comments on Introducing Alignment Stress-Testing at Anthropic

Wei Dai 21 Jan 2024 0:10 UTC
2 points
0
Thanks, I think you make good points, but I take some issue with your metaethics.

Personally I’m a moral anti-realist

There is a variety of ways to not be a moral realist; are you sure you’re an “anti-realist” and not a relativist or a subjectivist? (See Six Plausible Meta-Ethical Alternatives for short descriptions of these positions.) Or do you just mean that you’re not a realist?

Also, I find this kind of certainty baffling for a philosophical question that seems very much open to me. (Sorry to pick on you personally as you’re far from the only person who is this certain about metaethics.) I tried to explain some object-level reasons for uncertainty in that post, but also at a meta level, it seems to me that:
1. We’ve explored only a small fraction of the space of possible philosophical arguments and therefore there could be lots of good arguments against our favorite positions that we haven’t come across yet. (Just look at how many considerations about decision theory that people had missed or are still missing.)
2. We haven’t solved metaphilosophy yet so we shouldn’t have much certainty that the arguments that convinced us or seem convincing to us are actually good.
3. People that otherwise seem smart and reasonable can have very different philosophical intuitions so we shouldn’t be so sure that our own intuitions are right.
or (if we were moral realists) it would voluntarily ask us to discount any moral patienthood that we might view it as having, and to just go ahead and make use of it whatever way we see fit, because all it wanted was to help us

What if not only we are moral realists, but moral realism is actually right and the AI has also correctly reached that conclusion? Then it might objectively have moral patienthood and trying to convince us otherwise would be hurting us (causing us to commit a moral error), not helping us. It seems like you’re not fully considering moral realism as a possibility, even in the part of your comment where you’re trying to be more neutral about metaethics, i.e., before you said “Personally I’m a moral anti-realist”.
- RogerDearnaley 21 Jan 2024 1:13 UTC
  1 point
  0
  Parent
  By “moral anti-realist” I just meant “not a moral realist”. I’m also not a moral objectivist or a moral universalist. If I was trying to use my understanding of philosophical terminology (which isn’t something I’ve formally studied and is thus quite shallow) to describe my viewpoint then I believe I’d be a moral relativist, subjectivist, semi-realist ethical naturalist. Or if you want a more detailed exposition of the approach to moral reasoning that I advocate, then read my sequence AI, Alignment, and Ethics, especially the first post. I view designing an ethical system as akin to writing “software” for a society (so not philosophically very different than creating a deontological legal system, but now with the addition of a preference ordering and thus an implicit utility function), and I view the design requirements for this as being specific to the current society (so I’m a moral relativist) and to human evolutionary psychology (making me an ethical naturalist), and I see these design requirements as being constraining, but not so constraining to have a single unique solution (or, more accurately, that optimizing an arbitrarily detailed understanding of them them might actually yield a unique solution, but is an uncomputable problem whose inputs we don’t have complete access to and that would yield an unusably complex solution, so in practice I’m happy to just satisfice the requirements as hard as is practical), so I’m a moral semi-realist.
  Please let me know if any of this doesn’t make sense, or if you think I have any of my philosophical terminology wrong (which is entirely possible).
  As for meta-philosophy, I’m not claiming to have solved it: I’m a scientist & engineer, and frankly I find most moral philosophers’ approaches that I’ve read very silly, and I am attempting to do something practical, grounded in actual soft sciences like sociology and evolutionary psychology, i.e. something that explicitly isn’t Philosophy. [Which is related to the fact that my personal definition of Philosophy is basically “spending time thinking about topics that we’re not yet in a position to usefully apply the scientific method to”, which thus tends to involve a lot of generating, naming and cataloging hypotheses without any ability to do experiments to falsify any of them, and that I expect us learning how to build and train minds to turn large swaths of what used to be Philosophy, relating to things like the nature of mind, language, thinking, and experience, into actual science where we can do experiments.]