This discussion of “Friendly AI” is hopelessly anthropomorphic. It seems to be an attempt to imagine what a FAI optimizing the world to given person’s values will do, and these are destined to fail, if you bring up specific details, which you do. A FAI is a system expected to do something good, not a specific good thing known in advance. You won’t see it coming.
It seems to be an attempt to imagine what a FAI optimizing the world to given person’s values will do, and these are destined to fail
Yes, I think that this is right. An FAI would try to create a world in which we are all better off. That doesn’t mean that the world would be as any one of us considers perfect. Perhaps each of us would still consider it to be very suboptimal. But all of us would still desire it over the present world.
In other words, let’s grant, for the sake of argument, that most of us are doomed to continue to suffer from losing zero-sum status games forever. Nonetheless, there are possible worlds within which we all, even the losers, would much rather play these games. So there is still a lot that an FAI could do for us.
You can’t think about what specifically FAI will do, period. It seems quite likely there will be no recognizable humans in a world rebuilt by FAI. Any assumption is suspect, even the ones following from the most reliable moral heuristics.
Is it correct to call it FAI, then? Do you see a world with “no recognizable humans” as a very likely thing for the human race (or its extrapolated volition) to collectively want?
I’m considering the case of FAI, that is humanity’s preference correctly rendered.
Do you see a world with “no recognizable humans” as a very likely thing for the human race (or its extrapolated volition) to collectively want?
Status quo has no power. So the question shouldn’t be whether “no recognizable humans” is the particular thing humanity wants, but rather whether “preserving recognizable humans” happens to be the particular thing that humanity wants. And I’m not sure there are strong enough reasons to expect “world with recognizable humans” to be the optimal thing to do with the matter. It might be, but I’m not convinced we know enough to locate this particular hypothesis. The default assumption that humans want humans seems to stem from the cached moral intuition promoted by availability in the current situation, but reconstructing the optimal situation from preference is a very indirect process, that won’t respect the historical accidents of natural development of humanity, only humanity’s values.
You can’t think about what specifically FAI will do, period.
“Specifically” is relative. By some standards, we have never thought specifically about anything at all. (I have never traced precisely the path of every atom involved in any action.)
Nonetheless, one can think more or less specifically, and to think at all is to think a thought that is specific to some extent. To think, as you wrote above, that an “FAI is a system expected to do something good” is to think something more specific than one might, if one were committed to thinking nothing specific, period. (This is assuming that your words have any meaning whatsoever.)
ETA: In other words, as Eliezer wrote in his Coming of Age sequence, you must be thinking something that is specific to some extent, for otherwise you couldn’t even pose the problem of FAI to yourself.
Sure. The specific thing you say is that the outcome is “good”, but what that means exactly is very hard to decipher, and in particular hard or impossible to decipher in a form of a story, with people, their experiences and social constructions. It is the story that can’t be specific.
[ETA: I wrote the following when your comment read simply “Sure, why?”. I can see the plausibility of your claim that narrative moral imaginings can contribute nothing to the development of FAI, though it’s not self-evidently obvious to me. ]
Perhaps I missed the point of your previous comment.
I presumed that you thought that I was being too specific. I read you as expressing this thought by saying that one should not think specifically, “period”. I was pointing out the impossibility or meaninglessness of that injunction, at least in its extreme form. I was implicitly encouraging you to indicate the non-extreme meaning that you had intended.
The post has five bullet points at the end, and this does not respond to any of them. The post explores the nature of values that humans have, and values in general; Vladimir’s comment is to the effect that we can’t investigate values, and must design a Friendly AI without understanding the problem domain it will face.
Vladimir’s comment is to the effect that we can’t investigate values, and must design a Friendly AI without understanding the problem domain it will face.
We can’t investigate the content of human values in a way that is useful for constructing Friendly AI, and we can’t investigate what specifically Friendly AI will do. We can investigate values for the purpose of choosing better human-designed policies.
We can’t investigate the content of human values in a way that is useful for constructing Friendly AI
Do you want to qualify that some way? I interpret as meaning that learning about values has no relevance to constructing an AI whose purpose is to preserve values. It’s almost an anti-tautology.
I interpret as meaning that learning about values has no relevance to constructing an AI whose purpose is to preserve values. It’s almost an anti-tautology.
The classical analogy is that if you need to run another instance of a given program on a faster computer, figuring out what the program does is of no relevance, you only need to correctly copy its machine code and correctly interpret it on the new machine.
If you need to run another instance of a given program on a faster computer, but you don’t know what an algorithm is, or what part of the thing in front of you is a “computer” and what part is a “computer program”, and you have not as of yet discovered the concept of universal computation, nor are certain whether the computer hardware, or even arithmetic itself, operates deterministically -
-- then you should take some time to study the thing in front of you and figure out what you’re talking about.
You’d probably need to study how these “computers” work in general, not how to change the background color in documents opened with a word processor that runs on the thing. A better analogy in the direction you took is uploading: we need to study neurons, not beliefs that a brain holds.
You seem to think that values are just a content problem, and that we can build a mechanism now and fill the content in later. But the whole endeavor is full of unjustified assumptions about what values are, and what values we should pursue. We have to learn a lot more about what values are, what values are possible, what values humans have, and why they have them, before we can decide what we ought to try to do in the first place.
We have to learn a lot more about what values are, what values are possible, what values humans have, and why they have them, before we can decide what we ought to try to do in the first place.
Of course. Only the finer detail is content problem.
But the whole endeavor is full of unjustified assumptions about what values are, and what values we should pursue.
Not that I know of. On the contrary, the assumption is that one shouldn’t posit statements about which values human actually have, and what kind of mathematical structure values are is an open problem.
The discussion of human preferences has to be anthropomorphic, because human preferences are human. Phil is not anthropomorphizing the AI, he’s anthropomorphizing the humans it serves, which is OK.
The discussion of human preferences has to be anthropomorphic, because human preferences are human. Phil is not anthropomorphizing the AI, he’s anthropomorphizing the humans it serves, which is OK.
FAI doesn’t serve humans. It serves human preference, which is an altogether different kind of thing, and not even something humans have had experience with.
An analogy: the atomic structure of a spoon is not spoon-morphic, just because atomic structure of a spoon is of a spoon.
I disagree that humans have no experience with human preference. It is true that “formal preference” is not identical to the verbalized statements that people pragmatically pursue in their lives, but I also think that the difference between the two is somewhat bounded by various factors, including the bound of personal identity: if you diverge so much from what you are today, you are effectively dead.
Formal preference and verbalized preference are completely different kind of objects, almost nothing in common. Verbalized preference talks about natural categories, clusters of situations found in actual human experience. You don’t have any verbalized preference about novel configurations of atoms that can’t be seen as instances of the usual things, modified by usual verbs. Formal preference, on the other hand, talks about all possible configurations of matter.
Personal identity, as I discussed earlier, is a referent in the world sought by our moral intuition, a concept in terms of which a significant part of our moral intuition is implemented. When me-in-the-future concept fails to find a referent, this is a failure of verbalized preference, not formal preference. You’ll get a gap on your map, inability to estimate moral worth of situations in the future that lack you-in-the-future on many important aspects. But this gap on the map doesn’t correspond to a gap on the moral territory, to these configurations automatically having equal or no moral worth.
This is also a point of difference between formal preference and verbalized preference: formal preference refers to a definition that determines the truth about the moral worth of all situation, that establishes the moral territory, while verbalized preference is merely a non-rigorous human-level attempt to glimpse the shape of this territory. Of course, even in a FAI, formal preference doesn’t allow to get all the answers, but it is the criterion for the truth of imperfect answers that FAI will be able to find.
The following sounds worryingly like moral realism:
formal preference refers to a definition that determines the truth about the moral worth of all situation, that establishes the moral territory, while verbalized preference is merely a non-rigorous human-level attempt to glimpse the shape of this territory
Of course if you meant it in an antirealist sense, then
while verbalized preference is merely a non-rigorous human-level attempt to glimpse the shape of this territory
is problematic, because the map (verbalized preference) contributes causally to determining the territory, because (the brainware that creates) your verbalized preferences determines (in part) your formal preference.
Compare with your beliefs being implemented as patterns in your brain, which is a part of the territory. That the fact of your beliefs being a certain way, apart of their meaning and truth of that meaning, is a truth in its own right, doesn’t shatter the conceptual framework of map-territory distinction. You’d just need to be careful with what is the subject matter you currently consider.
I don’t know what you read in realist/anti-realist distinction; for me, there is a subject matter, and truth of that subject matter, in all questions. The questions of how that subject matter came to be established, in what way it is being considered, and who considers it, are irrelevant to the correctness of statements about the subject matter itself. Here, we consider “formal preference”. How is it defined, whether the way it came to be defined was influenced by verbal preference, is irrelevant to what it actually asserts, once it’s established what we are talking about.
If I consider “the program that was written in file1.c this morning”, this subject matter doesn’t change if the file was renamed in the afternoon, or was deleted without anyone knowing its contents, or modified, even perhaps self-modified by compiling and running the program determined by that file. The fact of “contents of file1.c” is a trajectory, a history of change, but it’s a fact separate from “contents of file1.c this morning”, and neither fact can be changed, though the former fact (the trajectory of change of the content of the file) can be determined by one’s actions, through doing something with the file.
verbalized preference is merely a non-rigorous human-level attempt to glimpse the shape of this territory
I think that one should add that verbalized preference is both an attempt to glimpse the “territory” that is formalized preference, and the thing that causes formalized preferences to exist in the first place. It is at one and the same time a map and a foundation.
On the other hand, your beliefs are only a map of how the world is. The world remains even if you don’t have beliefs about it.
I think that one should add that verbalized preference is both an attempt to glimpse the “territory” that is formalized preference, and the thing that causes formalized preferences to exist in the first place. It is at one and the same time a map and a foundation.
Verbalized preferences don’t specify formal preference. People use their formal preference through arriving at moral intuitions in specific situations they understand, and can form verbalized preferences as heuristic rules describing what kinds of moral intuitions are observed to appear upon considering what situations. Verbalized preferences are plain and simple summaries of observations, common sense understanding of the hidden territory of the machinery in the brain that produces the moral intuition in an opaque manner. While verbalized preferences are able to capture the important dimensions of what formal preference is, they no more determine formal preference than Newton’s laws, as written in a textbook, determine the way real world operates. They merely describe.
EDIT: And the conclusion to draw from this is that we can use our axiological intuitions to predict our formal preferences, in certain cases. In some cases, we might even predict perfectly: if you verbalize that you have some very simple preference, such as “I want there to be a brick on this table, and that’s all I want”, then your formal preference is just that.
Human preferences are too big and unwieldy to predict this simply. We each have many many preferences, and they interact with each other in complex ways. But I still claim that we can make educated guesses. As I said, verbalized preferences are the foundation for formal preferences. A foundation does not, in a simple way, determine the building on it. But if you see a 12 foot by 12 foot foundation, you can probably guess that the building on top of it is not going to be the Eiffel Tower.
This is closer. Still, verbalized preference is observation, not the reality of formal preference itself. The goal of preference theory is basically in coming up with a better experimental set-up than moral intuition to study formal preference. This is like moving on from study of physics by making observations of natural phenomena with naked eye, to lab experiments with rulers, clocks, microscopes and so on. Moral intuition, as experienced by humans, is too fuzzy and limited experimental apparatus, even if you use it to observe the outcomes of carefully constructed experiments.
On the other hand, your beliefs are only a map of how the world is. The world remains even if you don’t have beliefs about it.
Your beliefs shape the world though, if you allow high-level concepts to affect low-level ones. Aside from being made of the world in the first place, your actions will follow in part from your beliefs. If you don’t allow high-level concepts to affect low-level ones, then verbalized preference does not cause formalized preferences to exist.
I agree, I can almost hear Eliezer saying (correctly) that it’s daft to try and tell the FAI what to do , you just give it it’s values and let it rip. (if you knew what to do, you wouldn’t need the AI)
All this post has brought up is a problem that an AI might potentially have to solve. And sure it looks like a difficult problem, but it doesn’t feel like one that can’t be solved at all. I can think of several rubbish ways to make a bunch of humans think that they all have high status, a brain the size of a planet would think of a excellent one.
I can think of several rubbish ways to make a bunch of humans think that they all have high status,
Isn’t that what society currently does? Come up with numerous ways to blur and obscure the reality of where exactly you fall in the ranking, yet let you plausibly believe you’re higher than you really are?
Isn’t that what society currently does? Come up with numerous ways to blur and obscure the reality of where exactly you fall in the ranking, yet let you plausibly believe you’re higher than you really are?
Isn’t it that we each care about a particular status hierarchy? The WOW gamer doesn’t care about the status hierarchy defined by physical strength and good looks. It’s all about his 10 level 80 characters with maxed out gear, and his awesome computer with a Intel Core i7 975 Quad-Core 3.33Ghz cpu, 12GB of tri-channel DDR3, Dual SLIed GeForce GTX 260 graphics cards, 2 1TB hard drives, Bluray, and liquid cooling.
This issue came up on crookedtimber.org before in reply to a claim by Will Wilkinson that free market societies decrease conflict by having numerous different hierarchies so that everyone can be near the top in one of them. (Someone google-fu this?)
The CT.org people replied that these different hiearchies actually exist within a meta-hierarchy that flattens it all out and retains a universal ranking for everyone, dashing the hopes that everyone can have high status. The #1 WOW player, in other words, is still below the #100 tennis player.
Despite the ideological distance I have from them, I have to side with the CT.org folks on this one :-/
ETA: Holy Shi-ite! That discussion was from October ’06! Should I be worried or encouraged by the fact that I can remember things like this from so long ago?
The crooked timber post is here. On first glance it seems like a matter of degree: to the extent that there is such a universal ranking, it only fully defeats Wilkinson’s point if the universal ranking and its consequences are the only ranking anyone cares about. As long as different people care differently about (the consequences of) different rankings, which it seems to me is often the case, everyone can rise in their favorite ranking and benefit more than others are harmed.
ETA: though maybe the more hierarchies there are, the less good it feels to be #100 on any of them.
Okay, to substantiate my position (per a few requests), I dispute that you can actually achieve the state where people only care about a few particular hierarchies, or even that people have significant choice in which hierarchies they care about. We’re hardwired to care about status; this drive is not “up for grabs”, and if you could turn off your caring for part of the status ranking, why couldn’t you turn it all off?
Furthermore, I’m highly skeptical that e.g. the WOW superstar is actually fully content to remain in the position that being #1 in WOW affords him; rather, he’s doing the best he can given his abilities, and this narrow focus on WOW is a kind of resignation. In a way I can kind of relate: in high school, I used to dominate German competitions and classes involving math or science. While that was great, it just shifted my attention to the orchestra classes and math/debate competitions that I couldn’t dominate.
Now, you can dull the social influence on yourself that makes you care about status by staying away from the things that will make you compare yourself to the broader (e.g. non-WoW) society, but this is a devil’s bargain: it has the same kind of effect on you as solitary confinement, just of a lesser magnitude. (And I can relate there too, if anyone’s interested.)
I think the WOW superstar would, if he could, trade his position for one comparable to the #100 tennis player in a heartbeat. And how many mistresses does #1 in Wow convert to?
And how many mistresses does #1 in Wow convert to?
I don’t know about in-game WoW superstars, but I knew an admin of an “unofficial” Russian server of a major AAA MMORPG, and he said that basically all female players of that server he met in real life wanted to go to bed with him. This might have been an exaggeration, but I can confirm at least one date. BTW, I wouldn’t rate the guy as attractive.
In the 1990s I happened upon a game of Vampire (a live-action role-playing game) being played outdoors at night on the campus of UC Berkeley. After the game, I happened to be sitting around at Durant Food Court (a cluster of restaurants near campus) when I overheard one of the female players throw herself at one of the organizers: “How many experience points would I need to go to bed with you?” she asked playfully. (The organizer threw me a juicy grin on the side a few moments later, which I took as confirmation that the offer was genuine.)
I am guessing that in the environment of evolutionary adaptation, political success and political advantage consisted largely of things very much like being able to get a dozen people to spend an evening in some organized activity that you run.
ADDED. Now that I have had time to reflect, what she probably said is, “how many experience points do I get for . . .”, which is a wittier come-on than the one I originally wrote and which jibes with the fact that one of the organizer’s jobs during the game is to award experience points to players.
Interesting; I guess I underestimated the position of unofficial Russian WoW server admins in the meta-hierarchy—in part because I didn’t expect as many desirable Russian women to play WoW.
If the server population is a couple thousand players, and there are 5% of females among them, that leaves you with about 100 females, 10 of which will likely be attractive to you—and if you run a dozen servers or so, that’s definitely not a bad deal if you ask me :)
Take a less extreme version of the position you are arguing against: the WOWer cares about more than the WOW hierarchy, but the meta-hierarchy he sets up is still slightly different from the meta-hierarchy that the 100th best tennis player sets up. The tennis player wouldn rank (1st in tennis, 2nd in WOW) higher than (2nd in tennis, 1st in WOW), but the WOWer would flip the ranking. Do you find this scenario all that implausible?
It’s plausible, but irrelevant. The appropriate comparison is how the WoWer would regard a position
comparable [in status] to the #100 tennis player.
If he doesn’t yearn for a high ranking in tennis, it’s because of the particulars of tennis, not out of a lack of interest in a higher ranking in the meta-hierarchy.
Well, it’s not relevant if the WOWer would still rather be the 100th best tennis player and suck at WOW than his current position—which is plausible, but there are probably situations where this sort of preference does matter.
If he doesn’t yearn for a high ranking in tennis, it’s because of the particulars of tennis, not out of a lack of interest in a higher ranking in the meta-hierarchy.
He’s certainly interested in the meta-hierarchy, but why can’t he value the status gained from WOW slightly higher than the status gained from tennis, irrespective of how much he likes tennis and WOW in themselves?
Yes, I get that someone might plausibly not care about tennis per se. That’s irrelevant. What’s relevant is whether he’d trade his current position for one with a meta-hierarchy position near the #100 tennis player—not necessarily involving tennis! -- while also being something he has some interest in anyway.
What I dispute is that people can genuinely not care about moving up in the meta-hierarchy, since it’s so hardwired. You can achieve some level of contentedness, sure, but not total satisfaction. The characterization steven gave of the #1 WoW player’s state of mind is not realistic.
But we’re probably also wired to care mostly about the hierarchies of people with whom we interact frequently. In the EEA, those were pretty much the only people who mattered. [ETA: I mean that they were the only people to whom your status mattered. Distant tribes might matter because they could come and kick you off your land, but they wouldn’t care what your intra-tribe status was.]
The #1 WOW player probably considers other WOW players to be much more real, in some psychologically powerful way, than are professional tennis players and their fans. It would therefore be natural for him to care much more about what those other WOW players think.
But like I said earlier, that’s like saying, “If you live in solitary confinement [i.e. no interaction even with guards], you’re at the top of your hierarchy so obviously that must make you the happiest possible.”
You can’t selectively ignore segments of society without taking on a big psychological burden.
You can’t have high status if no other people are around. But high status is still a local phenomenon. Your brain wants to be in a tribe and to be respected by that tribe. But the brain’s idea of a tribe corresponds to what was a healthy situation in the EEA. That meant that you shouldn’t be in solitary confinement, but it also meant that your society didn’t include distant people with whom you had no personal interaction.
But from the perspective of an EEA mind, online interaction with other WoWers is identical (or at least extremely similar) to solitary confinement in that you don’t get the signals the brain needs to recognize “okay, high status now”. (This would include in-person gazes, smells, sounds, etc.) This is why I dispute that the WoW player actually can consider the other WoW players to be so psychologically real.
Ah—I’d been misreading this because I imagined the #1 WoW player would interact socially with other WoW players (“in real life”) like all of the WoW players I know do.
Well so far I’ve just been assuming ‘#1 WoW player’ is meaningful. As I understand it, there isn’t much to gain at the margins once you spend most of your time playing. Also, who says you can’t be on a computer and socializing? There’s plenty of time to look away from the computer while playing WoW, and you can play it practically anywhere.
Also, who says you can’t be on a computer and socializing?
Human psychology.
Your body can tell the difference between computer interaction and in-person interaction. Intermittently “socializing” while you try to play is still a very limited form of socializing.
I hang out with several people who play WoW at my place when they’re over. Other WoW players will spend time geeking out over their characters’ stats, gear, appearance, etc, and presumably our imaginary #1 would have less-dedicated groupies that would be interested in that sort of thing while he’s playing. Due to the amount of time spent travelling or waiting in queues, there are also a lot of times for traditional sorts of socialization—eating food next to other humans, throwing things at each other, whatever it is humans do. And WoW isn’t all that concentration-intensive, so it’s entirely possible to have a conversation while playing. And you can even play in the same room as other people who are in your group, and talk about the game in-person while you’re doing it.
Seriously: the Magic: the Gathering fanatic has social contact, but the lack of females in that social network has basically the same effect, in that it’s a more limited kind of social interaction that can’t replicate our EEA-wired desires.
Without going into too many personal details (PM or email me if you’re interested in that), for a while I lived a lifestyle where my in-person socialization was limited, as were most of my links to the broader society (e.g. no TV), though I made a lot of money (at least relative to the surrounding community).
I also found myself frequently sad, which was very strange, as I felt all of my needs and wants were being met. It was only after a long time that I noticed the correlation between “being around other people” and “not being sad”—and I’m an introvert!
I can think of several rubbish ways to make a bunch of humans think that they all have high status, a brain the size of a planet would think of a excellent one.
Create a lot of human-seeming robots = Give everyone a volcano = Fool the humans = Build the Matrix.
To quote myself:
In other words, it isn’t valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status. If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable:
Consciousness is not dependent on computational structure (this leads to vitalism); or
You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or
You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations.
Create a lot of human-seeming robots = Give everyone a volcano = Fool the humans = Build the Matrix
I agree that these are all rubbish ideas, which is why we let the AI solve the problem. Because it’s smarter than us. If this post was about how we should make the world the better place on our own, then these issues are indeed a (small) problem, but since it was framed in terms of FAI, it’s asking the wrong questions.
BTW, how do you let the AI solve the problem of what kind of AI to build?
What kind of AI to be. That’s the essence of being a computationally complex algorithm, and decision-making algorithm in particular: you always learn something new about what you should do, and what you’ll actually do, and not just learn it, but make it so.
This discussion of “Friendly AI” is hopelessly anthropomorphic. It seems to be an attempt to imagine what a FAI optimizing the world to given person’s values will do, and these are destined to fail, if you bring up specific details, which you do. A FAI is a system expected to do something good, not a specific good thing known in advance. You won’t see it coming.
(More generally, see the Fun theory sequence.)
Yes, I think that this is right. An FAI would try to create a world in which we are all better off. That doesn’t mean that the world would be as any one of us considers perfect. Perhaps each of us would still consider it to be very suboptimal. But all of us would still desire it over the present world.
In other words, let’s grant, for the sake of argument, that most of us are doomed to continue to suffer from losing zero-sum status games forever. Nonetheless, there are possible worlds within which we all, even the losers, would much rather play these games. So there is still a lot that an FAI could do for us.
You can’t think about what specifically FAI will do, period. It seems quite likely there will be no recognizable humans in a world rebuilt by FAI. Any assumption is suspect, even the ones following from the most reliable moral heuristics.
Is it correct to call it FAI, then? Do you see a world with “no recognizable humans” as a very likely thing for the human race (or its extrapolated volition) to collectively want?
I’m considering the case of FAI, that is humanity’s preference correctly rendered.
Status quo has no power. So the question shouldn’t be whether “no recognizable humans” is the particular thing humanity wants, but rather whether “preserving recognizable humans” happens to be the particular thing that humanity wants. And I’m not sure there are strong enough reasons to expect “world with recognizable humans” to be the optimal thing to do with the matter. It might be, but I’m not convinced we know enough to locate this particular hypothesis. The default assumption that humans want humans seems to stem from the cached moral intuition promoted by availability in the current situation, but reconstructing the optimal situation from preference is a very indirect process, that won’t respect the historical accidents of natural development of humanity, only humanity’s values.
“Specifically” is relative. By some standards, we have never thought specifically about anything at all. (I have never traced precisely the path of every atom involved in any action.)
Nonetheless, one can think more or less specifically, and to think at all is to think a thought that is specific to some extent. To think, as you wrote above, that an “FAI is a system expected to do something good” is to think something more specific than one might, if one were committed to thinking nothing specific, period. (This is assuming that your words have any meaning whatsoever.)
ETA: In other words, as Eliezer wrote in his Coming of Age sequence, you must be thinking something that is specific to some extent, for otherwise you couldn’t even pose the problem of FAI to yourself.
Sure. The specific thing you say is that the outcome is “good”, but what that means exactly is very hard to decipher, and in particular hard or impossible to decipher in a form of a story, with people, their experiences and social constructions. It is the story that can’t be specific.
[ETA: I wrote the following when your comment read simply “Sure, why?”. I can see the plausibility of your claim that narrative moral imaginings can contribute nothing to the development of FAI, though it’s not self-evidently obvious to me. ]
Perhaps I missed the point of your previous comment.
I presumed that you thought that I was being too specific. I read you as expressing this thought by saying that one should not think specifically, “period”. I was pointing out the impossibility or meaninglessness of that injunction, at least in its extreme form. I was implicitly encouraging you to indicate the non-extreme meaning that you had intended.
The post has five bullet points at the end, and this does not respond to any of them. The post explores the nature of values that humans have, and values in general; Vladimir’s comment is to the effect that we can’t investigate values, and must design a Friendly AI without understanding the problem domain it will face.
We can’t investigate the content of human values in a way that is useful for constructing Friendly AI, and we can’t investigate what specifically Friendly AI will do. We can investigate values for the purpose of choosing better human-designed policies.
Do you want to qualify that some way? I interpret as meaning that learning about values has no relevance to constructing an AI whose purpose is to preserve values. It’s almost an anti-tautology.
The classical analogy is that if you need to run another instance of a given program on a faster computer, figuring out what the program does is of no relevance, you only need to correctly copy its machine code and correctly interpret it on the new machine.
If you need to run another instance of a given program on a faster computer, but you don’t know what an algorithm is, or what part of the thing in front of you is a “computer” and what part is a “computer program”, and you have not as of yet discovered the concept of universal computation, nor are certain whether the computer hardware, or even arithmetic itself, operates deterministically -
-- then you should take some time to study the thing in front of you and figure out what you’re talking about.
You’d probably need to study how these “computers” work in general, not how to change the background color in documents opened with a word processor that runs on the thing. A better analogy in the direction you took is uploading: we need to study neurons, not beliefs that a brain holds.
You seem to think that values are just a content problem, and that we can build a mechanism now and fill the content in later. But the whole endeavor is full of unjustified assumptions about what values are, and what values we should pursue. We have to learn a lot more about what values are, what values are possible, what values humans have, and why they have them, before we can decide what we ought to try to do in the first place.
Of course. Only the finer detail is content problem.
Not that I know of. On the contrary, the assumption is that one shouldn’t posit statements about which values human actually have, and what kind of mathematical structure values are is an open problem.
The discussion of human preferences has to be anthropomorphic, because human preferences are human. Phil is not anthropomorphizing the AI, he’s anthropomorphizing the humans it serves, which is OK.
FAI doesn’t serve humans. It serves human preference, which is an altogether different kind of thing, and not even something humans have had experience with.
An analogy: the atomic structure of a spoon is not spoon-morphic, just because atomic structure of a spoon is of a spoon.
I disagree that humans have no experience with human preference. It is true that “formal preference” is not identical to the verbalized statements that people pragmatically pursue in their lives, but I also think that the difference between the two is somewhat bounded by various factors, including the bound of personal identity: if you diverge so much from what you are today, you are effectively dead.
Formal preference and verbalized preference are completely different kind of objects, almost nothing in common. Verbalized preference talks about natural categories, clusters of situations found in actual human experience. You don’t have any verbalized preference about novel configurations of atoms that can’t be seen as instances of the usual things, modified by usual verbs. Formal preference, on the other hand, talks about all possible configurations of matter.
Personal identity, as I discussed earlier, is a referent in the world sought by our moral intuition, a concept in terms of which a significant part of our moral intuition is implemented. When me-in-the-future concept fails to find a referent, this is a failure of verbalized preference, not formal preference. You’ll get a gap on your map, inability to estimate moral worth of situations in the future that lack you-in-the-future on many important aspects. But this gap on the map doesn’t correspond to a gap on the moral territory, to these configurations automatically having equal or no moral worth.
This is also a point of difference between formal preference and verbalized preference: formal preference refers to a definition that determines the truth about the moral worth of all situation, that establishes the moral territory, while verbalized preference is merely a non-rigorous human-level attempt to glimpse the shape of this territory. Of course, even in a FAI, formal preference doesn’t allow to get all the answers, but it is the criterion for the truth of imperfect answers that FAI will be able to find.
The following sounds worryingly like moral realism:
Of course if you meant it in an antirealist sense, then
is problematic, because the map (verbalized preference) contributes causally to determining the territory, because (the brainware that creates) your verbalized preferences determines (in part) your formal preference.
Compare with your beliefs being implemented as patterns in your brain, which is a part of the territory. That the fact of your beliefs being a certain way, apart of their meaning and truth of that meaning, is a truth in its own right, doesn’t shatter the conceptual framework of map-territory distinction. You’d just need to be careful with what is the subject matter you currently consider.
I don’t know what you read in realist/anti-realist distinction; for me, there is a subject matter, and truth of that subject matter, in all questions. The questions of how that subject matter came to be established, in what way it is being considered, and who considers it, are irrelevant to the correctness of statements about the subject matter itself. Here, we consider “formal preference”. How is it defined, whether the way it came to be defined was influenced by verbal preference, is irrelevant to what it actually asserts, once it’s established what we are talking about.
If I consider “the program that was written in file1.c this morning”, this subject matter doesn’t change if the file was renamed in the afternoon, or was deleted without anyone knowing its contents, or modified, even perhaps self-modified by compiling and running the program determined by that file. The fact of “contents of file1.c” is a trajectory, a history of change, but it’s a fact separate from “contents of file1.c this morning”, and neither fact can be changed, though the former fact (the trajectory of change of the content of the file) can be determined by one’s actions, through doing something with the file.
You said:
I think that one should add that verbalized preference is both an attempt to glimpse the “territory” that is formalized preference, and the thing that causes formalized preferences to exist in the first place. It is at one and the same time a map and a foundation.
On the other hand, your beliefs are only a map of how the world is. The world remains even if you don’t have beliefs about it.
Verbalized preferences don’t specify formal preference. People use their formal preference through arriving at moral intuitions in specific situations they understand, and can form verbalized preferences as heuristic rules describing what kinds of moral intuitions are observed to appear upon considering what situations. Verbalized preferences are plain and simple summaries of observations, common sense understanding of the hidden territory of the machinery in the brain that produces the moral intuition in an opaque manner. While verbalized preferences are able to capture the important dimensions of what formal preference is, they no more determine formal preference than Newton’s laws, as written in a textbook, determine the way real world operates. They merely describe.
EDIT: And the conclusion to draw from this is that we can use our axiological intuitions to predict our formal preferences, in certain cases. In some cases, we might even predict perfectly: if you verbalize that you have some very simple preference, such as “I want there to be a brick on this table, and that’s all I want”, then your formal preference is just that.
Human preferences are too big and unwieldy to predict this simply. We each have many many preferences, and they interact with each other in complex ways. But I still claim that we can make educated guesses. As I said, verbalized preferences are the foundation for formal preferences. A foundation does not, in a simple way, determine the building on it. But if you see a 12 foot by 12 foot foundation, you can probably guess that the building on top of it is not going to be the Eiffel Tower.
This is closer. Still, verbalized preference is observation, not the reality of formal preference itself. The goal of preference theory is basically in coming up with a better experimental set-up than moral intuition to study formal preference. This is like moving on from study of physics by making observations of natural phenomena with naked eye, to lab experiments with rulers, clocks, microscopes and so on. Moral intuition, as experienced by humans, is too fuzzy and limited experimental apparatus, even if you use it to observe the outcomes of carefully constructed experiments.
Your beliefs shape the world though, if you allow high-level concepts to affect low-level ones. Aside from being made of the world in the first place, your actions will follow in part from your beliefs. If you don’t allow high-level concepts to affect low-level ones, then verbalized preference does not cause formalized preferences to exist.
I agree, I can almost hear Eliezer saying (correctly) that it’s daft to try and tell the FAI what to do , you just give it it’s values and let it rip. (if you knew what to do, you wouldn’t need the AI) All this post has brought up is a problem that an AI might potentially have to solve. And sure it looks like a difficult problem, but it doesn’t feel like one that can’t be solved at all. I can think of several rubbish ways to make a bunch of humans think that they all have high status, a brain the size of a planet would think of a excellent one.
Isn’t that what society currently does? Come up with numerous ways to blur and obscure the reality of where exactly you fall in the ranking, yet let you plausibly believe you’re higher than you really are?
Isn’t it that we each care about a particular status hierarchy? The WOW gamer doesn’t care about the status hierarchy defined by physical strength and good looks. It’s all about his 10 level 80 characters with maxed out gear, and his awesome computer with a Intel Core i7 975 Quad-Core 3.33Ghz cpu, 12GB of tri-channel DDR3, Dual SLIed GeForce GTX 260 graphics cards, 2 1TB hard drives, Bluray, and liquid cooling.
This issue came up on crookedtimber.org before in reply to a claim by Will Wilkinson that free market societies decrease conflict by having numerous different hierarchies so that everyone can be near the top in one of them. (Someone google-fu this?)
The CT.org people replied that these different hiearchies actually exist within a meta-hierarchy that flattens it all out and retains a universal ranking for everyone, dashing the hopes that everyone can have high status. The #1 WOW player, in other words, is still below the #100 tennis player.
Despite the ideological distance I have from them, I have to side with the CT.org folks on this one :-/
ETA: Holy Shi-ite! That discussion was from October ’06! Should I be worried or encouraged by the fact that I can remember things like this from so long ago?
The crooked timber post is here. On first glance it seems like a matter of degree: to the extent that there is such a universal ranking, it only fully defeats Wilkinson’s point if the universal ranking and its consequences are the only ranking anyone cares about. As long as different people care differently about (the consequences of) different rankings, which it seems to me is often the case, everyone can rise in their favorite ranking and benefit more than others are harmed.
ETA: though maybe the more hierarchies there are, the less good it feels to be #100 on any of them.
Okay, to substantiate my position (per a few requests), I dispute that you can actually achieve the state where people only care about a few particular hierarchies, or even that people have significant choice in which hierarchies they care about. We’re hardwired to care about status; this drive is not “up for grabs”, and if you could turn off your caring for part of the status ranking, why couldn’t you turn it all off?
Furthermore, I’m highly skeptical that e.g. the WOW superstar is actually fully content to remain in the position that being #1 in WOW affords him; rather, he’s doing the best he can given his abilities, and this narrow focus on WOW is a kind of resignation. In a way I can kind of relate: in high school, I used to dominate German competitions and classes involving math or science. While that was great, it just shifted my attention to the orchestra classes and math/debate competitions that I couldn’t dominate.
Now, you can dull the social influence on yourself that makes you care about status by staying away from the things that will make you compare yourself to the broader (e.g. non-WoW) society, but this is a devil’s bargain: it has the same kind of effect on you as solitary confinement, just of a lesser magnitude. (And I can relate there too, if anyone’s interested.)
I think the WOW superstar would, if he could, trade his position for one comparable to the #100 tennis player in a heartbeat. And how many mistresses does #1 in Wow convert to?
I don’t know about in-game WoW superstars, but I knew an admin of an “unofficial” Russian server of a major AAA MMORPG, and he said that basically all female players of that server he met in real life wanted to go to bed with him. This might have been an exaggeration, but I can confirm at least one date. BTW, I wouldn’t rate the guy as attractive.
In the 1990s I happened upon a game of Vampire (a live-action role-playing game) being played outdoors at night on the campus of UC Berkeley. After the game, I happened to be sitting around at Durant Food Court (a cluster of restaurants near campus) when I overheard one of the female players throw herself at one of the organizers: “How many experience points would I need to go to bed with you?” she asked playfully. (The organizer threw me a juicy grin on the side a few moments later, which I took as confirmation that the offer was genuine.)
I am guessing that in the environment of evolutionary adaptation, political success and political advantage consisted largely of things very much like being able to get a dozen people to spend an evening in some organized activity that you run.
ADDED. Now that I have had time to reflect, what she probably said is, “how many experience points do I get for . . .”, which is a wittier come-on than the one I originally wrote and which jibes with the fact that one of the organizer’s jobs during the game is to award experience points to players.
Interesting; I guess I underestimated the position of unofficial Russian WoW server admins in the meta-hierarchy—in part because I didn’t expect as many desirable Russian women to play WoW.
If the server population is a couple thousand players, and there are 5% of females among them, that leaves you with about 100 females, 10 of which will likely be attractive to you—and if you run a dozen servers or so, that’s definitely not a bad deal if you ask me :)
Take a less extreme version of the position you are arguing against: the WOWer cares about more than the WOW hierarchy, but the meta-hierarchy he sets up is still slightly different from the meta-hierarchy that the 100th best tennis player sets up. The tennis player wouldn rank (1st in tennis, 2nd in WOW) higher than (2nd in tennis, 1st in WOW), but the WOWer would flip the ranking. Do you find this scenario all that implausible?
It’s plausible, but irrelevant. The appropriate comparison is how the WoWer would regard a position
If he doesn’t yearn for a high ranking in tennis, it’s because of the particulars of tennis, not out of a lack of interest in a higher ranking in the meta-hierarchy.
Well, it’s not relevant if the WOWer would still rather be the 100th best tennis player and suck at WOW than his current position—which is plausible, but there are probably situations where this sort of preference does matter.
He’s certainly interested in the meta-hierarchy, but why can’t he value the status gained from WOW slightly higher than the status gained from tennis, irrespective of how much he likes tennis and WOW in themselves?
Yes, I get that someone might plausibly not care about tennis per se. That’s irrelevant. What’s relevant is whether he’d trade his current position for one with a meta-hierarchy position near the #100 tennis player—not necessarily involving tennis! -- while also being something he has some interest in anyway.
What I dispute is that people can genuinely not care about moving up in the meta-hierarchy, since it’s so hardwired. You can achieve some level of contentedness, sure, but not total satisfaction. The characterization steven gave of the #1 WoW player’s state of mind is not realistic.
But we’re probably also wired to care mostly about the hierarchies of people with whom we interact frequently. In the EEA, those were pretty much the only people who mattered. [ETA: I mean that they were the only people to whom your status mattered. Distant tribes might matter because they could come and kick you off your land, but they wouldn’t care what your intra-tribe status was.]
The #1 WOW player probably considers other WOW players to be much more real, in some psychologically powerful way, than are professional tennis players and their fans. It would therefore be natural for him to care much more about what those other WOW players think.
But like I said earlier, that’s like saying, “If you live in solitary confinement [i.e. no interaction even with guards], you’re at the top of your hierarchy so obviously that must make you the happiest possible.”
You can’t selectively ignore segments of society without taking on a big psychological burden.
You can’t have high status if no other people are around. But high status is still a local phenomenon. Your brain wants to be in a tribe and to be respected by that tribe. But the brain’s idea of a tribe corresponds to what was a healthy situation in the EEA. That meant that you shouldn’t be in solitary confinement, but it also meant that your society didn’t include distant people with whom you had no personal interaction.
But from the perspective of an EEA mind, online interaction with other WoWers is identical (or at least extremely similar) to solitary confinement in that you don’t get the signals the brain needs to recognize “okay, high status now”. (This would include in-person gazes, smells, sounds, etc.) This is why I dispute that the WoW player actually can consider the other WoW players to be so psychologically real.
Ah—I’d been misreading this because I imagined the #1 WoW player would interact socially with other WoW players (“in real life”) like all of the WoW players I know do.
Wouldn’t the #1 WoW player be spending most of his waking hours on a computer instead of socializing?
Well so far I’ve just been assuming ‘#1 WoW player’ is meaningful. As I understand it, there isn’t much to gain at the margins once you spend most of your time playing. Also, who says you can’t be on a computer and socializing? There’s plenty of time to look away from the computer while playing WoW, and you can play it practically anywhere.
Human psychology.
Your body can tell the difference between computer interaction and in-person interaction. Intermittently “socializing” while you try to play is still a very limited form of socializing.
What sort of thing did you have in mind? (Am I missing out?)
What in-person-socializing/WoW-playing hybrid did you have in mind? Because I’m missing out!
I hang out with several people who play WoW at my place when they’re over. Other WoW players will spend time geeking out over their characters’ stats, gear, appearance, etc, and presumably our imaginary #1 would have less-dedicated groupies that would be interested in that sort of thing while he’s playing. Due to the amount of time spent travelling or waiting in queues, there are also a lot of times for traditional sorts of socialization—eating food next to other humans, throwing things at each other, whatever it is humans do. And WoW isn’t all that concentration-intensive, so it’s entirely possible to have a conversation while playing. And you can even play in the same room as other people who are in your group, and talk about the game in-person while you’re doing it.
LAN party
In fairness, you also “knew” that half the folks playing Magic:the Gathering are female, and knew that was true of RPG conventions as well.
So I tend not to weight your personal experiences heavily. Please understand.
Forget the WOWer then, how about the M:tG fanatic?
Implementation issue. Oops, wrong cop-out! :-P
Seriously: the Magic: the Gathering fanatic has social contact, but the lack of females in that social network has basically the same effect, in that it’s a more limited kind of social interaction that can’t replicate our EEA-wired desires.
I’m interested. How can you relate? What was your situation?
Without going into too many personal details (PM or email me if you’re interested in that), for a while I lived a lifestyle where my in-person socialization was limited, as were most of my links to the broader society (e.g. no TV), though I made a lot of money (at least relative to the surrounding community).
I also found myself frequently sad, which was very strange, as I felt all of my needs and wants were being met. It was only after a long time that I noticed the correlation between “being around other people” and “not being sad”—and I’m an introvert!
Here is the article you are looking for
er… why?
ETA: my counter point would be essentially what steven said, but you didn’t seem to give an argument.
See my reply to steven.
Create a lot of human-seeming robots = Give everyone a volcano = Fool the humans = Build the Matrix.
To quote myself:
In other words, it isn’t valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status. If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable:
Consciousness is not dependent on computational structure (this leads to vitalism); or
You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or
You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations.
I agree that these are all rubbish ideas, which is why we let the AI solve the problem. Because it’s smarter than us. If this post was about how we should make the world the better place on our own, then these issues are indeed a (small) problem, but since it was framed in terms of FAI, it’s asking the wrong questions.
You’re missing the main point of the post. Note the bullet points are ranked in order of increasing importance. See the last bullet point.
BTW, how do you let the AI solve the problem of what kind of AI to build?
What kind of AI to be. That’s the essence of being a computationally complex algorithm, and decision-making algorithm in particular: you always learn something new about what you should do, and what you’ll actually do, and not just learn it, but make it so.
...or more likely, this won’t be a natural problem-category to consider at all.