A Simple Solution to the FAI Problem
I suggest this because it is extremely simple, and in the spirit that simple solutions should be considered early rather than late. While the suggestion itself is simple, its implementation might be harder than any other approach. This is intended as a discussion, and part of what I want you to do as my reader is identify the premises which are most problematic.
My first argument aims at the conclusion that producing an AGI ‘off the assembly line’ is impossible. This is a particular case of a more general claim that for anything to be recognizable as an intelligence, it must have a familiarity with a shared world. So if we had a brain in a vat, hooked up to a simulated universe, we could only understand the brain’s activity to be thinking if we were familiar with the world the brain was thinking about. This is itself a case of the principle that in order to understand the meaning of a proposition, we have to be able to understand its truth-conditions. If I say “Schnee ist weiss” and you don’t know what I mean, I can explain the meaning of this sentence completely by pointing out that it is true if snow is white. I cannot explain its meaning without doing this.
Thus, we cannot recognize any proposition as meaningful unless we can recognize its truth conditions. And recognizing the truth conditions of a proposition requires a shared familiarity with the world on the part of the speaker and the listener. If we can’t recognize any of the propositions of a speaker as meaningful, then we can’t recognize the speaker as a speaker, nor even as a thinker. Recognizing something as intelligent requires a shared familiarity with the world.
So in order for an AGI to be recognized as intelligent, it would have to share with us a familiarity with the world. It is impossible to program this in, or in any way assemble such familiarity. It is achieved only by experience. Thus, in order to create an AGI, we would have to create a machine capable of thinking (in the way babies are) and then let it go about experiencing the world.
So while an AGI is, of course, entirely possible, programming an AGI is not. We’d have to teach it to think in the way we teach people to think.
Thus, it is of course also impossible to program an AGI to be an FAI (though it should be possible to program a potential thinker so as to make it more likely to develop friendliness). Friendliness is a way of thinking, a way of being rational, and so like all thinking it would have to be the result of experience and education. Making an AGI think, and making it friendly, are unavoidably problems of intellectual and ethical education, not in principle different from the problems of intellectual and ethical education we face with children.
Thus, the one and only way to produce an FAI is to teach it to be good in the way we teach children to be good. And if I may speak somewhat metaphorically, the problem of the singularity doesn’t seem to be radically different from the problem of having children: they will be smarter and better educated then we are, and they will produce yet smarter and even better educated children themselves, so much so that the future is opaque to us. The friendliness of children is a matter of the survival of our species, and they could easily destroy us all if they developed in a generally unfriendly way. Yet we have managed, thus far, to teach many people to be good.
My solution is thus extremely simple, and one which many, many people are presently competent to accomplish. On the other hand, trying to make one’s children into good people is more complicated and difficult, I think, than any present approach to FAI. AGI might be a greater challenge in the sense that it might be a more powerful and unruly child than any we’ve had to deal with. We might have to become better ethical teachers, and be able to teach more quickly, but the problem isn’t fundamentally different.
The problem here is that you have a central theme that, without sufficient justification, becomes more and more extravagant over a few paragraphs. So what start out as true-ish statements about a modest claim end up as false statements about an extreme claim.
What am I talking about?
Modest claim: “we cannot recognize any proposition as meaningful unless we can recognize its truth conditions.”
Getting less modest: “So in order for an AGI to be recognized as intelligent, it would have to share with us a familiarity with the world.”
More extreme claim: “Thus, in order to create an AGI, we would have to create a machine capable of thinking (in the way babies are) and then let it go about experiencing the world.”
Extravagant claim: “Thus, the one and only way to produce an FAI is to teach it to be good in the way we teach children to be good.”
A more restrained line of reasoning would go like this:
We cannot recognize any proposition as meaningful unless we can recognize that it has meaningful truth conditions.
So in order to know that an AI is thinking meaningful thoughts, we have to know that has some reference for truth conditions that we would find meaningful.
Thus, in order to create an AGI, we would have to create a machine capable of experiencing something we recognize as meaningful.
Given that we won’t be totally certain about the contents of friendliness when programming an FAI, we will want our AI to have meaningful thoughts about the concept of Friendliness itself.
Thus, we will need any FAI to be able to experience, directly or indirectly, a set of things meaningful to all of human desires and ethics.
What does the second ‘meaningful’ here refer to? I have in mind something like ‘truth conditions articulated in a Tarskian meta-language’.
Pretty much the same thing as “can recognize” means in your sentence—“meaningful to humans.” If you want a short definition, tough; humans are complicated. I should also say that I think the premise is false, even as I restated it, but at least it’s close to true.
Can you explain why you think it’s false? I understand the burden of proof is on me here, but I could use some help thinking this through if you’re willing to grant me the time.
Around here I don’t think we worry too much about the burden of proof :P
Anyhow the objections mostly stem from the fact that we don’t live in a world of formal logic—we live in a world of probabilistic logic (on our good days). For example, if you know that gorblax has a 50% chance of meaning “robin,” and I say “look, a gorblax,” my statement isn’t completely meaningful or meaningless.
But how would we come to an estimate of its meaning in the first place? I suppose by understanding how the utterance constrains the expectations of the utterer, no? What is this other than knowing the truth conditions of the utterance? And truth conditions have to be just flatly something that can satisfy an anticipation. If we reraise the problem of meaning here (why should we?) we’ll run into a regress.
Focus on whether the AGI is intelligent, not if it is recognizable as intelligent. Though from intelligence, recognizable as intelligent should follow, though not neccesarily as you describe, by evaluating the truth of its expressed propositions, but perhaps by seeing it successfully rearrange the universe, assuming we survive the process.
I take it that the latter question settles the former, in the sense that it would be empirically meaningless to talk about intelligence over and above our ability to recognize it as such.
Is this a sign of intelligence? It seems to me we could easily imagine (or discover, and perhaps we have already discovered) something which has rearranged our world without being intelligent. Ice-ages, for example.
The point is that your particular example of how to recognize intelligence is not exaustive, and that you are just opening yourself up to confusion by introducing an unnecesary layer of indirection.
It is certainly positive evidence for intelligence. It is very strong evidence if the universe is rearrange in a manner that maximizes a simple utility function.
But if we had a proto AGI ready to activate that could predict would rearange the universe in a particular way, would considerations of if it is “intelligent” have any bearing on whether we should turn it on, once we know what it would do? (Though we would have used an understanding of its intelligence to predict what it would do. The question is: does it matter whether the processes so analyzed are really “intelligence”?)
It matters for the purposes of my argument, but not for yours. So point taken. I’m exclusively discussing real intelligence and something we can recognize as such. An AI such as you describe would seem to be to be a more powerful version of something that exists presently, however.
Given that our arguments are meant to describe the same reality, it should matter the same for both of them. How is your notion of “real intelligence” actually important?
Well, I’m assuming the project of FAI is to produce an artificial person who is ethical. If the project is described in weaker terms, say that of creating a machine that behaves in some predictable way, then my argument may just not be relevant.
That assumption is incorrect.
Ah! This, is the article I needed to read, thanks for pointing me to it.
NO! How do you know what’s impossible? How can you be sure that it’s impossible to have a program that given a million books and a million hours of home videos, is not capable of having “familiarity with the world”, despite never having interacted with it? Remember that your argument about intelligence doesn’t require exactly the type of “familiarity with the world” that humans have.
You’re just giving us a flat assertion about a complex technical problem, without any argument at all!
This would be a familiarity with the world by my standards, though in order for an AI to understand the language in the books, it would have to have familiarity of a shared world with the author. The trouble, in short, would be teaching it to read.
That’s not hard at all. Give it a big corpus of stuff to read, make it look for patterns and meaning. It would figure it out very quickly.
have you seen that alien message?
Well, I can see how an AI could come up with patterns and quickly compose hypotheses about the meaning of those books...but that (and the example the article discussed) is a case of translation. Learning a first language, and translating a new one into an old one are very different problems. One way to put my point is that it can look for patterns and meaning, but only because it is capable of meaning things of its own. And it is not possible to program this into something, it has to be got by experience. So it would be very easy to teach the AI to read English (I assume this could be taught in less time than I have words for) but I’m talking about teaching it to read full stop.
How do you suppose children learn language if recognizing patterns and meanings does not qualify? What non-pattern recognizing, non-meaning assigning experiences in particular are indispensable?
I suppose I should say first that I don’t think we have any good idea about how it is that someone first learns a language. Suppose a baby’s first word is ‘mama’. Is this yet a word? Can the baby mean anything by it? Probably not. When do we reach that point where a baby becomes capable of meaning? I expect there is no hard black line. At some point, we recognize someone as a language user.
That said, I think ‘recognizing patterns and meanings’ may well be a fine description of language-learning. I’m not saying that it’s incorrect, just that it’s not programmable. I’m saying that this kind of recognition requires a familiarity with a shared world.
Again, why? We had a program capable of understanding simple sentences forty years ago, SHRDLU:
I don’t see why it would be impossible to make something much better by “just programming it in”. Is there some kind of reading level that no algorithm can surpass if it doesn’t learn by experience?
I guess this comes down to the very complicated question of what ‘understanding a language’ amounts to. I take it we can agree that SHRDLU wasn’t thinking in any sense comparable to a human being or an AGI (since I take it we agree that SHRDLU wasn’t an AGI). But also notice that if your example is one of language-learning, you’ve picked a case where the learning thing already knows (some substantial part of) a language.
And lastly, I wouldn’t consider this a counterexample to the claim that learning a language requires a familiarity with a shared world. The machine you describe is obviously making reference to a shared world in its conversation.
Familiarity with the world is just a certain pattern of bits inside the AI’s hard drive. Any pattern of bits can, in principle, be programmed into the AI. Doing this may be difficult, but if we’re just talking about experience of the world here, you could just copy that experience from a human brain (assuming the technology, etc.).
Well, we could only recognize these bits as being thoughts pertaining to a shared world by actually sharing that world. So even if we try to program familiarity with the world into the machine, it could only ‘count’ for the purposes of our ability to recognize thoughts once the AI has spent time operating in our world. The upshot of this being that nothing can come off the assembly line as a thinker. Thinking is something it has to develop. This places no restrictions on how fast it would develop intelligence, just restrictions on the extent to which we can assemble intelligence before experience.
I don’t see how this could be right. Suppose I have an AI that’s spent a long time being and thinking in the world. It’s been trained. Next, I copy it, seven thousand times. We can copy software precisely, so the copies will be indistinguishable from the original, and therefore qualify as thinking. But they will also be factory fresh.
You might want to say “a new copy of an old AI is an old AI”. But there are lots of tweaks we can make to a program+data that will differentiate it, in small ways that don’t materially affect its behavior. Would that not qualify as a new AI?
That’s a good argument, and I don’t want to say that ‘a new copy of an AI is an old AI’, I do think I should say that your only strong evidence for the intelligence of your copies would be their resemblance to your educated original. You’d have to see them operating to see them as intelligent. And I take it that it’s a corollary of the idea of uploading one’s neural-activity that the ‘copy’ isn’t a new being with no experiences or education.
We know these thoughts pertain to a shared would before we program them from the AI because of where we got them from.
No no no no no.
Also, your proposal of teaching it to be “good” won’t guarantee that it will remain “good” after many self-modifications.
http://wiki.lesswrong.com/wiki/Detached_lever_fallacy
I used to subscribe to a similar belief myself but the above article persuaded me otherwise.
To clarify my original comment, the hard part is designing an AGI that is provably friendly, not “sorta kinda outta be friendly” and you did not suggest anything like a proof.
Disclaimer: I have my doubts that a provably friendly AGI is possible, but that is, nevertheless, the task EY set out to do.
What is it you think may not be possible? I imagine that proofs for being self-reflectively coherent, actually optimizing its goal, stable thru rewrites and ontological revolutions, and so on are totally doable. The only fuzzyness is whether CEV is the best goal, or even a friendly one. I think the “provably friendly” thing is about having proved all those nice strong properties plus giving it the best goal system we have, (which is currently CEV).
In the Godel sense. For example, it might be independent in whatever mathematical framework EY would find it reasonable to start from.
http://lesswrong.com/lw/rn/no_universally_compelling_arguments/
Consider that nurture is a kind of argument: you can try to explain something about morality to a child, but just because they are already are in a state to accept that… this is not at all necessarily true for an AGI.
Could you help me understand the relation of this article to mine? My theory doesn’t involve anything like a universally compelling argument, or any universally compelling moral theory. I’m saying, after all, that the basic competence involved in making an intelligent thing friendly was in place several thousand years ago and has nothing to do with moral theorizing (at least not in the sense of philosophical moral theorizing).
You’re right that children are in a state to accept moral instruction, but this is because they’ve been given some prior moral instruction, and so on and so on. There’s no ‘first moment’ of education.
The relation is that you’ve assumed that the AI will accept some kind of moral teaching, or any teaching at all for that matter. You can not nurture a rock into being moral or logical, it has to be created already in motion.
If you are able to create an AI that could be nurtured at all like a human, you would have had to create the basic machinery of morality already and that’s the hard part. If you had an AI that had the required properties of FAI, except a moal goal, you would just say “go do CEV” and it would go do it. (maybe it would be a bit more complex, but it would be the easy part.)
You are way out of your depth with this post. Go read the sequences thoroughly, read the current understanding of the FAI problem, including all the arguments against proposals like yours. If you still think you have something to add then, please do.
No argument there, though I’ve spent a fair amount of time with the sequences. I just found myself with a lot of unanswered questions. I figured a decent way to get those answered would be to post a view and allow people to respond. So while I appreciate your comment, I would appreciate even more links to specific sequences you have in mind, and some discussion as to their meaning and the quality of their argumentation. This is a great demand on your time, of course, so I couldn’t expect you humor me in this way. But that’s what the discussion section of this site is for, no?
A premise here is that human beings come with some basic machinery of morality/rationality. I don’t doubt this, but what sort of machinery do you have in mind exactly?
Good points, asking questions about confusing stuff is the first step in the direction of figuring it out!
By the way, I’m not sure what the proper place on LW is to ask questions like “I have a theory, I’m not sure it’s correct, pls comment”… things you wouldn’t post as top level, but still think it would make an interesting discussion.
(Note: your post also started interesting discussions, confusing stuff was hopefully cleared up, but downvotes are still there for you. Monthly open threads seemed to be OK for that purpose, but (at least for me) they are a little chaotic compared to nice headlines in Discussion. Maybe there should be a “newcomers” section, with downvotes only for attitude but not for content?)
That’s a nice idea, though the downvotes for content do amount to worthwhile feedback.
See the metaethics stuff, lawful intelligence. If you must choose one, understand the metaethics. I can’t think of any others you have a pressing need to read.
A loving and nurturing home is not always enough even for humans, some are just born psychopaths. A flaw in the AGI design (“genetic makeup”) may result in a psychopathic AGI, and even if the probability of it is low, it is probably not something to chance, given the potential ramifications.
I don’t know if I’d use that term, it implies that if we are just a bit more careful, we can avoid it.
It is almost certain that any AI we create is going to be a psychopath (even a friendly one), the trick is how to make it our psychopath.
I take it the idea here is that this is easier than creating a person and teaching them to be ethical?
Not necessarily easier, but certainly more promising. The easiest way to get a human based AI is to upload a human, which is a rather simple compared to building a rational, provably goal-following AI. Uploading a human will just get you a superpowerful dictator that gets corrupted by power.
Why more promising? It seems to me that we either accept the risk of trying to teach an artificial person to be good, or we accept the risk of uploading someone we expect to remain good, or of letting someone we hope to be good build a helpful psychopath. After all, if that programmer has a faulty conception of the human good then they’ll create a monster, even if it is a ‘friendly’ one. In every case, we have to rely on the uncertain integrity of the ethical person.
It’s not a person, it’s an optimisation process. Don’t anthropomorphise AI. Your are right that the risk is large.
Which we know to be nearly impossible with few ways to improve the chances.
You are not familiar with the current plan and the reasoning behind it. Go read CEV. Also metaethics, because you seem to take it as possible that a human could have a good enough conception of value to program it.
fallacy of grey. Some methods are much more promising than others, even if all are uncertain.
I certainly agree with you there. I have some familiarity with CEV (though it’s quite technical and I don’t have much background in decision theory), but on the basis of that familiarity, I’m under the impression that creating an artificial person and teaching it to be ethical is the safest and most reliable way to accomplish our goal of surviving the singularity. But I haven’t argued for that, of course.
Lost me at
It’s probably extremely infeasible, sure, but impossibility is really a stronger claim in computer science. So now I’m thinking whether you actually think it’s literally impossible to write some types of programs just because they are likely to be quite long and complex, or whether you’re just using sloppy language to proceed to arguments of questionable integrity (“Thus, it is of course also impossible...”).
Doesn’t seem really insightful otherwise either. Assumes the genetic complexity in humans with all sorts of built-in propensity for social dynamics is equivalent to a blank-slate AGI, when that part seems to be where the difficulty of the FAI problem lies. Once a working AGI is up and running at the newborn baby analogue stage, you’d already better have a pretty deep and correct friendliness solution done and implemented in it. A newborn Clippy will just politely observe all your extremely skilled pedagogy for raising human children into ethical adults, play along as long as it needs to, and then kill you and grind your body up for the 3.5 paperclips it can make from the trace iron in it without a second thought.
Though it should be noted that since the trace metals in a human body are actually rare chemical isotopes that will cause any paperclips made of them to quickly and irreversibly decay into a fine mist of non-paperclip particles, any Clippy that actually considered grinding up humans for paperclip elements would obviously be a very naive and foolish Clippy.