Be Wary of Thinking Like a FAI
I recently realized that, encouraged by LessWrong, I had been using a heuristic in my philosophical reasoning that I now think is suspect. I’m not accusing anybody else of falling into the same trap; I’m just recounting my own situation for the benefit of all.
I actually am not 100% sure that the heuristic is wrong. I hope that this discussion about it generalizes into a conversation about intuition and the relationship between FAI epistemology and our own epistemology.
The heuristic is this: If the ideal FAI would think a certain way, then I should think that way as well. At least in epistemic matters, I should strive to be like an ideal FAI.
Examples of the heuristic in use are:
--The ideal FAI wouldn’t care about its personal identity over time; it would have no problem copying itself and deleting the original as the need arose. So I should (a) not care about personal identity over time, even if it exists, and (b) stop believing that it exists.
--The ideal FAI wouldn’t care about its personal identity at a given time either; if it was proven that 99% of all observers with its total information set were in fact Boltzmann Brains, then it would continue to act as if it were not a Boltzmann Brain, since that’s what maximizes utility. So I should (a) act as if I’m not a BB even if I am one, and (b) stop thinking it is even a meaningful possibility.
--The ideal FAI would think that the specific architecture it is implemented on (brains, computers, nanomachines, giant look-up tables) is irrelevant except for practical reasons like resource efficiency. So, following its example, I should stop worrying about whether e.g. a simulated brain would be conscious.
--The ideal FAI would think that it was NOT a “unified subject of experience” or an “irreducible substance” or that it was experiencing “ineffable, irreducible quale,” because believing in those things would only distract it from understanding and improving its inner workings. Therefore, I should think that I, too, am nothing but a physical mechanism and/or an algorithm implemented somewhere but capable of being implemented elsewhere.
--The ideal FAI would use UDT/TDT/etc. Therefore I should too.
--The ideal FAI would ignore uncomputable possibilities. Therefore I should too.
...
Arguably, most if not all of the conclusions I drew in the above are actually correct. However, I think that the heuristic is questionable, for the following reasons:
(1) Sometimes what we think of as the ideal FAI isn’t actually ideal. Case in point: The final bullet above about uncomputable possibilities. We intuitively think that uncomputable possibilites ought to be countenanced, so rather than overriding our intuition when presented with an attractive theory of the ideal FAI (in this case AIXI) perhaps we should keep looking for an ideal that better matches our intuitions.
(2) The FAI is a tool for serving our wishes; if we start to think of ourselves as being fundamentally the same sort of thing as the FAI, our values may end up drifting badly. For simplicity, let’s suppose the FAI is designed to maximize happy human life-years. The problem is, we don’t know how to define a human. Do simulated brains count? What about patterns found inside rocks? What about souls, if they exist? Suppose we have the intuition that humans are indivisible entities that persist across time. If we reason using the heuristic I am talking about, we would decide that, since the FAI doesn’t think it is an indivisible entity that persists across time, we shouldn’t think we are either. So we would then proceed to tell the FAI “Humans are naught but a certain kind of functional structure,” and (if our overruled intuition was correct) all get killed.
Thoughts?
...
Note 1: “Intuitions” can (I suspect) be thought of as another word for “Priors.”
Note 2: We humans are NOT solomonoff-induction-approximators, as far as I can tell. This bodes ill for FAI, I think.
AFAICT you are not an ideal FAI, so your model of what an ideal FAI would do is always suspect.
The fact that your post was upvoted so much makes me take it seriously; I want to understand it better. Currently I see your post as merely a general skeptical worry. Sure, maybe we should never be very confident in our FAI-predictions, but to the extent that we are confident, we can allow that confidence to influence our other beliefs and decisions, and we should be confident in some things to some extent at least (the alternative, complete and paralyzing skepticism, is absurd) Could you explain more what you meant, or explain what you think my mistake is in the above reasoning?
Of course, Bayesians want to be Mr. degrees-of-belief Carneades, not Mr. know-nothing Berkeley. Far be it from me to suggest that we ought to stop making models. It just worried me that you were so willing to adjust your behavior based on inherently untrustworthy predictions.
An acquaintance of mine liked to claim that superhuman intelligence was one of Superman’s powers. The idea immediately struck me as contradictory: nothing Superman does will ever be indicative of superhuman intelligence as long as the scriptwriter is human. My point here is the same: Your model of an ideal FAI will fall short of accurately simulating the ideal just as much as your own mind falls short of being ideal.
This is a really good point.
It is easier to determine whether you are doing “better” than your current self than it is to determine how well you line up with a perceived ideal being. So perhaps the lesson to take away is to try to just be better rather than be perfect.
Really? That doesn’t seem obvious to me. Could you justify that claim?
An “ideal” being is many layers of “better” than you are, whereas something that is simply better is only one layer better. To get to ideal, you would have to imagine someone better, then imagine what that person would consider better, and so forth, until you hit a state where there are no further improvements to be made.
In the picture you just drew, the ideal being is derived from a series of better beings, thus it is (trivially) easier to imagine a better being than to imagine an ideal being.
I see it differently: The ideal being maximizes all good qualities, whereas imperfect beings have differing levels of the various good qualities. Thus to compare a non-ideal being to an ideal being, we only need to recognize how the ideal being does better than the non-ideal being in each good quality. But to compare two non-ideal beings, we need to evaluate trade-offs between their various attributes (unless one is strictly greater than the other)
Thinking about it more, I am not happy with either of the above models. One question that arises is: Does the same reasoning extend to other cases as well? i.e. are we better off thinking about incremental improvements than about the ideal society? Are we better off thinking about incremental improvements than about the ideal chess algorithm?
I think in some cases maybe we are, but in some cases we aren’t—ideals are useful sometimes. I’d go farther to say that some aspects of many ideals must be arrived at by iterating, but other aspects can be concluded more directly. An uninteresting conclusion, but one that supports my overall point: I wasn’t claiming that I knew everything about the ideal FAI, just that I had justified high confidence in some things.
So… a LW version of WWJD? I suspect it would have very similar issues.
Worse, probably, since an ideal FAI would just optimize hard for utility, rather than actively trying to set a moral example.
The examples given seem questionable even as applications of the heuristic. It is not clear to me that an ideal FAI should do those things, nor that the same principle applies to myself indicates the things you say it does.
But I agree with your reason (2), and would also propose a third reason: some things that really are good ideas for ideal agents are very bad ideas for non-ideal agents. This also applies between agents with merely differing levels of imperfection: “I’m a trained professional. Don’t try this at home”.
Hmm, okay. I’d be interested to hear your thoughts on the particular cases then. Are there any examples that you would endorse?
I agree with the points about Boltzmann Brains and mind substrates. In those cases, though, I’m not sure the FAI heuristic saves you any work, compared to just directly asking what the right answer is.
Almost certainly not true if taken verbatim; one of the critical traits of an FAI (as opposed to a regular AGI) is that certain traits must remain stable under self-improvement. An FAI would care very strongly about certain kinds of changes. But with a less literal reading, I can see what you’re going for here—yes, an ideal FAI might be indifferent to copying/deletion except to the extent that those help or hinder its goals.
I’m not sure how that belief, applied to oneself, cashes out to anything at all, at least not with current technology. I also don’t see any reason to go from “the FAI doesn’t care about identity” to “I shouldn’t think identity exists.”
(Disclaimer: I am not a decision theorist. This part is especially likely to be nonsense.)
You should use which one?
The less snappy version is that TDT and UDT both have problem cases. We don’t really know yet what an ideal decision theory looks like.
Second, I doubt any human can actually implement a formal decision theory all the time, and doing it only part-time could get you “valley of bad rationality”-type problems.
Third, I suspect you could easily run into problems like what you might get by saying “an ideal reasoner would use Solomonoff Induction, so I should too”. That’s a wonderful idea, except that even approximating it is computationally insane, and in practice you won’t get to use any of the advantages that make Solomonoff Induction theoretically optimal.
If you instead mean things like “an ideal FAI would cooperate in PD-like scenarios given certain conditions”, then sure. But again, I’m not sure the FAI heuristic is saving you any work.
A factual FAI might, for mere practical reasons. I don’t see why an ideal FAI normatively should ignore them, though.
Ok, thanks.
I don’t either, now that I think about it. What motivated me to make this post is that I realized that I had been making that leap, thanks to applying the heuristic. We both agree the heuristic is bad.
Why are we talking about a bad heuristic? Well, my past self would have benefited from reading this post, so perhaps other people would as well. Also, I wanted to explore the space of applications of this heuristic, to see if I had been unconsciously applying it in other cases without realizing it. Talking with you has helped me with that.
What’s the alternative? The question isn’t whether this moral heuristic is flawed (it is), but whether it’s advantageous.
That, in turn, will depend on more than your expectation of FAI behavior. How do those FAI expectations differ from your existing moral framework?
Why not? If it’s trying to maximize human values, humans consider death to have negative value, and humans consider the FAI to be alive, then the FAI would try to die as little as possible, presumably by cloning itself less. It might clone itself a bunch early on so that it can prevent other people from dying and otherwise do enough good to make the sacrifice worth it, but it would still care about its personal identity over time.
You’re equivocating. Humans consider death of humans to have negative value. If the humans that create the FAI don’t assign negative value to AI death, then the FAI won’t either.
It’s not clear that humans wouldn’t assign negative value to AI death. It’s certainly intelligent. It’s not entirely clear what other requirements there are, and it’s not clear what requirements an AI would fulfill.
That sounds like a thought-stopper. What is the utility of the belief itself? What predictions can we make if personal identity exists? What is the maximum set of incremental changes you can make to yourself until you stop being “you”? What is the utility of being “current you” as opposed to “optimized you”, and which “you” gets to decide? What is the utility of being “you five years ago” as opposed to “current you”, and which “you” gets to decide?
You have to think about uncomputable possibilities to know what is computable.
For new users, I’m guessing FAI stands for “friendly AI”, specifically a self-aware, self-modifying/improving AI that’s also “friendly” and so won’t kill everyone.
An egoist answer: it is I who judge decisions and that is prior to the form of a judgment. It is I who says thinking like an FAI is good or bad. Good or bad for who? Me.
As a study of identity, AI researchers and disputants might benefit from egoism—the philosophy of the individual.
I think what you are really looking for is a model above the fray. By trying to mold your mind into that of FAI you are trying to decide what to leave behind and what to keep. This problem is deeply spiritual. By divesting yourself of identity you are allowing a new idea of who you are to develop. What does that look like? By wanting to improve your life and make it as happy and long as possible you are doing what you can to make this life count. That is a question of what does your life look like? How can you manifest that? Although you are more than a physical working self improvement and improving the world around are noble and decent goals. I think if you seek to become as human and as incarnated as possible you will improve.
Surely if you provably know what the ideal FAI would do in many situations, a giant step forward has been made in FAI theory?
Keeping the tone positive and conducive for discussion...It’s not at all clear to me what a representative friendly AI would do in any situation.
Since when has provability been considered a necessary condition for decision making? For instance, before you posted your comment, did you prove that your comment would show up on the discussion board, or did you just find it likely enough to justify the effort?
Do you not know what the word “heuristic” means?