[SEQ RERUN] Nonperson Predicates
Today’s post, Nonperson Predicates was originally published on 27 December 2008. A summary (taken from the LW wiki):
An AI, trying to develop highly accurate models of the people it interacts with, may develop models which are conscious themselves. For ethical reasons, it would be preferable if the AI wasn’t creating and destroying people in the course of interpersonal interactions. Resolving this issue requires making some progress on the hard problem of conscious experience. We need some rule which definitely identifies all conscious minds as conscious. We can make do if it still identifies some nonconscious minds as conscious.
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we’ll be going through Eliezer Yudkowsky’s old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Devil’s Offers, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day’s sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.
Consider the intuitively simpler problem of “is something a universal turing machine?” Consider further this list of things that are capable of being a universal turing machine;
Computers.
Conway’s game of life.
Elementary cellular automata.
Lots of Nand gates.
Even a sufficiently complex shopping list might qualify. And it’s even worse, because knowing that A doesn’t have personhood, and that B doesn’t have personhood doesn’t let us conclude that A+B doesn’t have personhood. A single Transistor isn’t a computer, but 3510 transistors might be a 6502. If we want to be 100% safe, we have to rule out anything we can’t analyze, which means we pretty much have to rule out everything. We might as well make the function always return 1.
OK, as bad as that sounds, it just means we shouldn’t work too hard on solving the problem perfectly, because we know we’ll never be able to do so in a meaningful way. But perhaps we can solve the problem imperfectly. Spam assassin faces a very similar kind of problem, “how can we tell if a message is spam?” The technique it uses is conceptually simple; Pick a test that some messages pass and some fail. Use the test on a corpus of messages classified as spam and a corpus classified as non-spam, and use the results to assign a probability that a message is spam if it passes the test. In addition to the obvious advantage of “I can see how to do that for a non-person predicate test”, such a test could also give a score for “has some person-like properties”. Thus we can meaningfully approach the problem of A + B being a person even though A and B aren’t by themselves.
What kind of tests can we run? Beats me, but presumably we’ll have something before we can make an AI by design.
One problem with this approach is it could be wrong. It might even be very wrong. Also, training the predicate function might be an evil process - that is, training may involve purposely creating things that pass.
Nitpick: strictly speaking, a computer is not a universal Turing machine because it has a finite amount of memory and is therefore a finite state machine (and in particular can therefore only run finitely many programs). When we say that computers are universal Turing machines we are talking about an idealized version of a computer that acquires more memory as necessary.
Regarding the problem of determining whether a Turing machine is universal, this is undecidable by Rice’s theorem, which asserts more generally that any nontrivial property (in the sense that at least one Turing machine has it and at least one Turing machine doesn’t have it) of Turing machines is undecidable. The best you can do is an algorithm which returns “is a UTM” on some Turing machines, returns “is not a UTM” on others, or doesn’t halt (or outputs “unknown”). For example, it might search through proofs that a given Turing machine is or is not universal (possibly up to some upper bound).
Yes, that was sort of the point—you can’t make a function for “is a Turing machine” that works in all cases, and you can’t make a “is a non-person” function that works in all case either. Further, the set of things you can rule out with 100% certainty is to small to be useful.
Don’t see how that relates to my suggestion of a probabilistic answer though. Has anyone proven that you can’t make a statistically valid statement about the “Is a Turing machine” question?
The problem with training isn’t purposely creating things that pass. It’s purposely creating things that don’t. In order to figure out what doesn’t pass, we need a predicate function. Once we’ve figured out how to find things that won’t pass, we’ve already found the answer.
Doesn’t follow. Consider;
I accept that in order to classify something, we need to be able to classify it.
I’m suggesting there might be a function that classifies some things incorrectly, and is still useful.
Depending on what you mean by “capable”, I’d add “a bunch of silicon and germanium atoms” to the list.
Then millions, billions, or trillions of people die. That’s a lot in comparison to what a human normally deals with. That upper bound is more than the total number of people who have ever lived, but it’s still nothing in comparison to the astronomical waste of waiting even a second to consider the problem. Not unless you really, really hate death.
How long does someone have to live before their life is worth living? Before all the joy they feel balances out the sadness of their death? I haven’t bothered running the numbers, but I’m sure that if it’s less than a billion years, the sacrifice is more than worth it.