Not really. An AI that didn’t have a specific desire to be friendly to mankind would want to kill us to cut down on unnecessary entropy increases.
As you get closer to the mark, with AGI’s that have utility function that roughly resembles what we would want, but is still wrong, the end results are most likely worse than death. Especially since there should be much more near-misses than exact hits. Like, AGI that doesn’t want to let you die, regardless of what you go through, and little regard to your other sort of well-being, would be closer to the FAI than paperclip maximizer that would just plain kill you. As you get closer to the core of friendliness, you get all sorts of weird AGI’s that want to do something that twistedly resembles something good, but is somehow missing something or is somehow altered so that the end result is not at all what you wanted.
As you get closer to the core of friendliness, you get all sorts of weird AGI’s that want to do something that twistedly resembles something good, but is somehow missing something or is somehow altered so that the end result is not at all what you wanted.
Is this true or is this a useful assumption to protect us from doing something stupid?
Is it true that Friendliness is not an attractor or is it that we cannot count on such a property unless it is absolutely proven to be the case?
My idea there was that if it’s not Friendly, then it’s not Friendly, ergo it is doing something that you would not want an AI to be doing(if you thought faster and knew more and all that). That’s the core of the quote you had there. Random intelligent agent would simply transform us into something of value, so we would most likely die very quickly. However, when you get closer to the Friendliness, Ai is no longer totally indifferent to us, but rather, is maximizing something that could involve living humans. Now, if you take an AI that wants there to be living humans around, but is not known for sure to be Friendly, what could go wrong? My answer, many things, as what humans prefer to be doing is rather complex set of stuff, and even quite little changes could make us really, really unsatisfied with the end result. At least, that’s the idea I’ve gotten from posts here like Value is Fragile.
When you ask if Friendliness is an attractor, do you mean to ask if intelligences near Friendly ones in the design spaces tend to transform into Friendly ones? This seems rather unlikely, as that sort of AI’s most likely are capable of preserving their utility function, and the direction of this transformation is not “natural”. For these reasons, arriving at the Friendliness is not easy, and thus, I’d say you gotta have some sort of a way to ascertain the Friendliness before you can trust it to be just that.
Ants and daffodils might, by some definitions, have preferences—but it wouldn’t be necessary for a FAI to explicitly consider their preferences, as long as their preferences constitute some part of humanity’s CEV, which seems likely: I think an intact Earth ecosystem would be rather nice to retain, if at all possible.
The entropic contribution of ants and daffodils would doubtless make them candidates for early destruction by a UFAI, if such a step even needed to be explicitly taken alongside destroying humanity.
An AI that that had a botched or badly preserved Friendliness, or that was unfriendly but had been initialized with supergoals involving humans, may well have specific, unpleasant, non-extermination plans for humans.
Imagine an AGI with with the opposite utility function of an FAI, it minimizes the Friendly Utility Function, which would involve doing things far worse than killing us. If you are not putting effort into choosing a utility function, building this AGI seems as likely as building an FAI, as well as lots of other possibilities in the space of AGIs whose utility functions refer to humans, some of which would keep us alive, not all in ways we would appreciate.
The reason I would expect an AGI in this space to be somewhat close to Friendly, is: just hitting the space of utility functions that refer to humans is hard, if it happens it is likely because a human deliberately hit it, and this should indicate that the human has the skill and motivation to optimize further within that space to build an actual Friendly AGI.
If you stipulate that the programmer did not make this effort, and hitting the space of AGIs that keep humans alive only occurred in tiny quantum branches, then you have screened of the argument of a skilled FAI developer, and it seems unlikely that the AGI within this space would be Friendly.
If you are not putting effort into choosing a utility function, building this AGI seems as likely as building an FAI
You’ve made a lot of good comments in this thread, but I disagree with this. As likely?
It seems you are assuming that every possible point in AI mind space is equally likely, regardless of history, context, or programmer intent. This is like saying that, if someone writes a routine to sort numbers numerically, it’s just as likely to sort them phonetically.
It seems likely to me that this belief, that the probability distribution over AI mindspace is flat, has become popular on LessWrong, not because there is any logic to support it, but because it makes the Scary Idea even scarier.
Yes, my predictions of what will happen when you don’t put effort into choosing a utility function are inaccurate in the case where you do put effort into choosing a utility function.
This is like saying that, if someone writes a routine to sort numbers numerically, it’s just as likely to sort them phonetically.
Well, lets suppose someone wants a routine to sort numbers numerically, but doesn’t know how to do this, and tries a bunch of stuff without understanding. Conditional on the programmer miraculously achieving some sort of sorting routine, what should we expect about it? Sorting phonetically would add extra complication over sorting numerically, as the information about the names of numbers would have to be embedded within the program, so that would seem less likely. But a routine that sorts numerically ascending is just as likely as a routine that sorts numerically descending, as these routines have a complexity preserving one to one correspondance by interchaning “greater than” with “less than”.
And the utility functions I clamed were equally likely before have the same complexity preserving one to one correspondance.
Would it? Though we do contribute to entropy, things like, say, stars do so at a much faster pace. Admittedly this is logically distinct from the AI’s decision to destroy humanity, but I don’t see why it would immediately jump to the conclusion that we should be wiped out when the main sources of entropy are elsewhere.
More to the point, not all unFriendly AIs would necessarily care about entropy.
For almost any objective an AI had, it could better accomplish it the more free energy the AI had. The AI would likely go after entropy losses from both stars and people. The AI couldn’t afford to wait to kill people until after it had dealt with nearby stars because by then humans would have likely created another AI god.
If we want an AI to be friendly, the thing is to make sure that its utility function includes things that only humans can provide. That way, the AI will have to trade us what we want in order to get what it wants. The possibilities are endless. Give it a taste for romance novels, or cricket, or Jerry Springer. Stand-up comedy, postmodern deconstructionism, or lolcats. Electric power is one intriguing possibility.
The nice thing about having it give us what we want in trade, rather than simply giving us what it was programmed to believe we want, is that we are then permitted to change our minds about what we want, after we have already had a taste of material abundance and immortality. I certainly expect that my values will become revised after a few centuries of that, in ways that I am not yet ready to extrapolate or to have extrapolated for me.
Please stop commenting on this topic until you have understood more of what has been written about it on LW and elsewhere. Unsubstantiated proposals harm LW as a community. LW deals with some topics that look crazy on surface examination; you don’t want people who dig deeper to stumble on comments like this and find actual crazy.
This idea is in fact crazy. However, I share your concerns and believe similar lines of thinking may be fruitful. In particular, I’m not convinced there aren’t ways to secure an AI through clever implementations of its utility function. I made a specific proposal along those lines in this comment.
If we want an AI to be friendly, the thing is to make sure that its utility function includes things that only humans can provide. That way, the AI will have to trade us what we want in order to get what it wants. The possibilities are endless. Give it a taste for… postmodern deconstructionism...
Like many people, I don’t think this idea will work. But I voted it up, because I vote on comment expected value. On a topic that is critical to solve, and for which there are no good ideas, entertaining crazy ideas is worthwhile. So I’d rather hear one crazy idea that a good Yudkowskian would consider sacrilege, than ten well-reasoned points that are already overrepresented on LessWrong. It’s analogous to the way that optimal mutation rate is high when your current best solution is very sub-optimal, and optimal selection strength (reproduction probability as a function of fitness) is low when your population is nearly homogenous (as ideas about FAI on LessWrong are).
I must admit that I was surprised by just how severely this posting got downvoted. It is always dangerous to mix playfulness with discussion of serious and important issues. My examples of the products of human culture which someone or something might wish to preserve for eternity apparently pushed some buttons here in this community of rationalists.
Back around the year 1800, Napoleon invaded Egypt, carrying in his train a collection of scientific folks that considered themselves version 1.0 rationalists. This contact of enlightenment with antiquity led to a Western fascination with things Egyptian which lasted roughly two centuries before it degenerated into Laura Croft and sharpened razor blades. But it did lead the French, and later the British to disassemble and transport to their own capitals examples of one of the more bizarre aspects of ancient Egyptian monumental architecture. Obelisks.
Of course, we rationalist Americans saw the opportunity to show our superiority over the “old world”. We didn’t steal an authentic ancient Egyptian obelisk to decorate our capital city. We built a newer, bigger, and better one! Yep, we’re Americans. Anything anyone else can do, we can do better. Same applies to our FAIs. They won’t fall into the fallacy of “authenticity”. Show them a romance novel, or a stupid joke, or a schmaltzy photograph and they will build something better themselves. Not bodice rippers, but corset-slicing scalpels. Not moron jokes, but jokes about rocks. Not kittens playing with balls of yarn, but sentient crickets playing baseball.
I cannot be the only person here who thinks there is some value in preserving things simply to preserve them—things like endangered species, human languages, and aspects of human culture. It it really so insane to think that we could instill the same respect-for-the-authentic-but-less-than-perfect in a machine that we create?
It it really so insane to think that we could instill the same respect-for-the-authentic-but-less-than-perfect in a machine that we create?
We could. But should we? (And how is it even relevant to your original comment? This seems to be a separate argument for roughly the same conclusion. What about the original argument? Do you argree it’s flawed (that is AI can in fact out-native the natives)?)
See also discussion of Waser’s post, in particular second paragraph of my comment here:
If you consider a single top-level goal, then disclaimers about subgoals are unnecessary. Instead of saying “Don’t overly optimize any given subgoal (at the expense of the other subgoals)”, just say “Optimize the top-level goal”. This is simpler and tells you what to do, as opposed to what not to do, with the latter suffering from all the problems of nonapples.
This seems to be a separate argument for roughly the same conclusion. What about the original argument? Do you agree it’s flawed (that is AI can in fact do out-native the natives)?
I thought I had just made a pretty direct argument that there is one way in which an AI cannot out-native the natives—authenticity. Sorry if it was less than clear.
See also discussion of Waser’s post, my comment here. Edit (second paragraph, not the first one).
I have no idea which second paragraph you refer to. May I suggest that you remove all reference to Waser and simply say what you wish to say about what I wrote.
You don’t want to elevate not optimizing something too much as a goal (and it’s difficult to say what that would mean), while just working on optimizing the top-level goal unpacks this impulse as appropriate. Authenticity could be an instrumental goal, but is of little relevance when we discuss values or decision-making in sufficiently general context (i.e. not specifically the environments where we have revealed preference for authenticity despite it not being a component of top-level goal).
you don’t want to elevate not optimizing something too much as a goal (and it’s difficult to say what that would mean), while just working on optimizing the top-level goal unpacks this impulse as appropriate.
For example, do I parse it as “to elevate not optimizing something too much” or as “don’t want … too much”. And what impulse is “this impulse”?
There is valid intuition (“impulse”) that in certain contexts, some sub-goals, such as “replace old buildings with better new ones” shouldn’t be given too much power, as that would lead to bad consequences according to other aspects of their evaluation (e.g. we lose an architectural masterpiece).
To unpack, or cash out an intuition means to create a more explicit model of the reasons behind its validity (to the extent it’s valid). Modeling the above intuition as “optimizing too strongly is undesirable” is incorrect, and so one shouldn’t embrace this principle of not optimizing things too much with high-priority (“elevate”).
Instead, just trying to figure out what top-level goal asks for, and optimizing for the overall top-level goal without ever forgetting what it is, is the way to go. Acting exclusively for top-level goal explains the intuition as well: if you optimize a given sub-goal too much, it probably indicates that you forgot the overall goal, working on something different instead, and that shouldn’t be done.
Conflicts between subgoals indicate premature fixation on alternative solutions. The alternatives shouldn’t be prioritized as goals in and of themselves. The other aspects of their evaluation would fit better as goals or subgoals to be optimized. A goal should give you guidance for choosing between alternatives.
In your example, one might ask what goal can one optimize to help make good decisions between policies like “replace old buildings with better ones” and “don’t lose architectural masterpieces”?
I am puzzled by many things here. One is how we two managed to make this thread so incoherent. A second is just what all this talk of sub-goals and “replacing old buildings with better new ones” and over-optimization has to do with anything I wrote.
I thought that I was discussing the idea of instilling top level values into an AI that would be analogous to those human values which lead us to value the preservation of biological diversity and human cultural diversity. The values which cause us to create museums. The values which lead us to send anthropologists out to learn something about primitive cultures.
The concept of over-optimizing never entered my mind. I know of no downside to over-optimizing other than a possible waste of cognitive resources. If optimizing leads to bad things it it because we are optimizing on the wrong values rather than optimizing too much on the good ones.
ETA: Ah. I get it now. My phrase “respect for the authentic but less than perfect”. You saw it as an intuition in favor of not “overdoing” the optimizing. Believe me. It wasn’t.
What a comedy of errors. May I suggest that we delete this entire conversation?
If you keep stuff in a museum, instead of using its atoms for something else, you are in effect avoiding optimization of that stuff. There could be a valid reason for that (the stuff in the museum remaining where it is happens to be optimal in context), or a wrong one (preserving stuff is valuable in itself).
One idea similar to what I guess you are talking about which I believe to hold some water is sympathy/altruism. If human values are such that we value well-being of sufficiently human-like persons, then any such person will receive a comparatively huge chunk of resources from a rich human-valued agent, compared to what it’d get only for game-theoretic reasons (where one option is to get disassembled if you are weak), for use according to their own values that are different from our agent’s. This possibly could be made real, although it’s rather sketchy at this point.
Meta:
I am puzzled by many things here. One is how we two managed to make this thread so incoherent.
Of the events I did understand, there was one miscommunication, my fault for not making my reference clearer. It’s now edited out. Other questions are still open.
Ah. I get it now. My phrase “respect for the authentic but less than perfect”. You saw it as an intuition in favor of not “overdoing” the optimizing. Believe me. It wasn’t.
You and I have had several conversations and each time I formed the impression that you were not making enough effort to explain yourself. You are apparently a very smart person, and you seem to think that this means that you are a good communicator. It does not. In my opinion, you are one of the worst communicators here. You tend to be terse to the point of incomprehensibility. You tend to seize upon interpretations of what other people say that can be both bizarre and unshakable. Conversing with you is simply no fun.
You and I have had several conversations and each time I formed the impression that you were not making enough effort to explain yourself. You are apparently a very smart person, and you seem to think that this means that you are a good communicator. It does not. In my opinion, you are one of the worst communicators here. You tend to be terse to the point of incomprehensibility. You tend to seize upon interpretations of what other people say that can be both bizarre and unshakable. Conversing with you is simply no fun.
That’s quite interesting. I rarely have an issue understanding Vladimir. And when I do, a few minutes of thought generally allows me to reconstruct what he is saying. On the other hand, I seem to find you to be a poor communicator not in communicating your ideas but in understanding what other people are trying to say. So I have to wonder how much of this is on your end rather than his end. Moreover, even if that’s the not situation, it would seem highly probable to me that some people will have naturally different styles and modes of communication, and will perceive people who use similar modes as being good communicators and perceive people who use very different modes as being poor communicators. So it may simply be that Vladimir and I are of similar modes and you are of a different mode. I’m not completely sure how to test this sort of hypothesis. If it is correct, I’d expect LWians to clump with opinions about how good various people are at communicating. But that could happen for other reasons as well such as social reasons. So it might be better to test whether given anonymized prose from different LWians whether that shows LWians clumping in their evaluations.
Thank you for this feedback. I had expected to receive something of the sort from VN, but if it was encoded in his last paragraph, I have yet to decypher it.
I seem to find you to be a poor communicator not in communicating your ideas but in understanding what other people are trying to say. So I have to wonder how much of this is on your end rather than his end.
It certainly felt like at least some of the problem was on my end yesterday, particularly when AdeleneDawner apparently responded meaningfully to the VN paragraph which I had been unable to parse. The thing is, while I was able to understand her sentences, and how they were responses to VN’s sentences, and hence at least something of what VN apparently meant, I still have no understanding of how any of it is relevant in the context of the conversation VN and I were having.
I was missing some piece of context, which VN was apparently assuming would be common knowledge. It may be because I don’t yet understand the local jargon. I’ve only read maybe 2⁄3 of the sequences and find myself in sympathy with only a fraction of what I have read.
some people will have naturally different styles and modes of communication, and will perceive people who use similar modes as being good communicators and perceive people who use very different modes as being poor communicators.
A good observation. My calling Vladimir a poor communicator is an instance of mind-projection. He is not objectively poor at communicating—only poor at communicating with me.
I’m not completely sure how to test this sort of hypothesis. If it is correct, I’d expect LWians to clump with opinions about how good various people are at communicating. But that could happen for other reasons as well such as social reasons. So it might be better to test whether given anonymized prose from different LWians whether that shows LWians clumping in their evaluations.
Might be interesting to collect the data and find the clusters. I’m sure it is easiest to communicate with those who are at the least cognitive distance. And still relatively easy at some distance as long as you can accurately locate your interlocutor in cognitive space. The problems usually arise when both parties are confused about where the other is “coming from”. But do not notice that they are confused. Or do not announce that they have noticed.
You are apparently a very smart person, and you seem to think that this means that you are a good communicator. It does not. In my opinion, you are one of the worst communicators here. You tend to be terse to the point of incomprehensibility. You tend to seize upon interpretations of what other people say that can be both bizarre and unshakable. Conversing with you is simply no fun.
I generally agree with this characterization (except for self-deception part). I’m a bad writer, somewhat terse and annoying, and I don’t like the sound of my own more substantive writings (such as blog posts). I compensate by striving to understand what I’m talking about, so that further detail or clarification can be generally called up, accumulated across multiple comments, or, as is the case for this particular comment, dumped in redundant quantity without regard for resulting style. I like practicing “hyper-analytical” conversation, and would like more people to do that, although I understand that most people won’t like that. I’m worse than average (on my level) at quickly grasping things that are not clearly presented (my intuition is unreliable), but I’m good at systematically settling on correct understanding eventually, discarding previous positions easily, as long as the consciously driven process of figuring out doesn’t terminate prematurely.
Since people are often wrong, assuming a particular mistake is not always that much off a hypothesis (given available information), but the person suspected of error will often notice the false positives more saliently than they deserve, instead of making a correction, as a purely technical step, and moving forward.
I compensate by striving to understand what I’m talking about
Well, that is unquestionably a good thing, and I have no reason to doubt you that you do in fact tend to understand quite a large number of things that you talk about. I wish more people had that trait.
I like practicing “hyper-analytical” conversation
I’m not sure exactly what is meant here. An example (with analysis) might help.
I’m good at systematically settling on correct understanding eventually, discarding previous positions easily, as long as the consciously driven process of figuring out doesn’t terminate prematurely.
If that is the case, then I misinterpreted this exchange:
Me: Ah. I get it now. My phrase “respect for the authentic but less than perfect”. You saw it as an intuition in favor of not “overdoing” the optimizing. Believe me. It wasn’t.
You: I can’t believe what I don’t understand.
Perhaps the reason for my confusion is that it struck me as a premature termination. If you wish to understand something, you should perhaps ask a question, not make a comment of the kind that might be uttered by a Zen master.
Since people are often wrong, assuming a particular mistake is not always that much off a hypothesis (given available information), but the person suspected of error will often notice the false positives more saliently than they deserve, instead of making a correction, as a purely technical step, and moving forward.
Here we go again. …
I don’t understand that comment. Sorry. I don’t understand the context to which it is intended to be applicable, nor how to parse it. There are apparently two people involved in the scenario being discussed, but I don’t understand who does what, who makes what mistake, nor who should make a correction and move forward.
You are welcome to clarify, but quite frankly I am coming to believe that it is just not worth it.
I’m not sure exactly what is meant here. An example (with analysis) might help.
I basically mean permitting unpacking of any concept, including the “obvious” and “any good person would know that” and “are you mad?” ones, and staying on a specific topic even if the previous one was much more important in context, or if there are seductive formalizations that nonetheless have little to do with the original informally-referred-to concepts. See for example here.
P.: Ah. I get it now. My phrase “respect for the authentic but less than perfect”. You saw it as an intuition in favor of not “overdoing” the optimizing. Believe me. It wasn’t.
VN: I can’t believe what I don’t understand.
Perhaps the reason for my confusion is that it struck me as a premature termination.
I simply meant that I don’t understand what you referred to in your suggestion to believe something. You said that “[It’s not] an intuition in favor of not “overdoing” the optimizing”, but I’m not sure what it is then, and whether on further look it’ll turn out to be what I’d refer to with the same words. Finally, I won’t believe something just because you say I should, a better alternative to discussing your past beliefs (which I don’t have further access to and so can’t form much better understanding of) would be to start discussing statements (not necessarily beliefs!) you name at present.
Since people are often wrong, assuming a particular mistake is not always that much off a hypothesis (given available information), but the person suspected of error will often notice the false positives more saliently than they deserve, instead of making a correction, as a purely technical step, and moving forward.
Here we go again. …
Consider person K. That person K happens to be wrong on any given topic won’t be shocking. People are often wrong. When person K saying something confusing, trying to explain the confusingness of that statement by person K being wrong is not a bad hypothesis, even if the other possibility is that what K said was not expressed clearly, and can be amended. When person V says to person K “I think you’re wrong”, and it turns out that person K was not wrong in this particular situation, that constitutes a “false positive”: V decided that K is wrong, but it’s not the case. In the aftermath, K will remember V being wrong on this count as a personal attack, and will focus too much on pointing out how wrong it was to assume K’s wrongness when in fact it’s V who can’t understand anything K is saying. Instead, K could’ve just stated a clarifying statement that falsifies V’s hypothesis, so that the conversation would go on efficiently, without undue notice to the hypothesis of being wrong.
(You see why I’m trying to be succinct: writing it up in more detail is too long and no fun. I’ve been busy for the last days, and replied to other comments that felt less like work, but not this one.)
K says A meaning X. V thinks A means Y. V disagrees with Y.
So if V says “If by ‘A’ you mean ‘Y’, then I have to disagree,” then every thing is fine. K corrects the misconception and they both move on. On the other hand, if V says “I disagree with ‘Y’”, things become confused, because K never said ‘Y’. If V says “I disagree with ‘A’, things become even more confused. K has been given no clue of the existence of the misinterpretation ‘Y’ - reconstructing it from the reasons V offers for disputing ‘A’ will take a lot of work.
But if V likes to be succinct, he may simply reply “I disagree” to a long comment and then (succinctly) provide reasons. Then K is left with the hopeless task of deciding whether V is disagreeing with ‘A’, ‘B’, or ‘C’ - all of which statements were made in the original posting. The task is hopeless, because the disagreement is with ‘Y’ and neither party has even mentioned ‘Y’.
I believe that AdeleneDawner makes the same point.
(You see why I’m trying to be succinct: writing it up in more detail is too long and no fun. I’ve been busy for the last days, and replied to other comments that felt less like work, but not this one.)
I suspect that you would find yourself with even less tedious work to do if you refrained from making cryptic comments in the first place. That way, neither you nor your victims has to work at transforming what you write into something that can be understood.
I suspect that you would find yourself with even less tedious work to do if you refrained from making cryptic comments in the first place.
I like commenting the way I do, it’s not tedious.
That way, neither you nor your victims has to work at transforming what you write into something that can be understood.
Since some people will be able to understand what I wrote, even when it’s not the person I reply to, some amount of good can come out of it. Also, the general policy of ignoring everything I write allows to avoid the harm completely.
As a meta remark, your attitude expressed in the parent comment seems to be in conflict with attitude expressed in this comment. Which one more accurately reflects your views? Have they changed since then? From the past comment:
A good observation. My calling Vladimir a poor communicator is an instance of mind-projection. He is not objectively poor at communicating—only poor at communicating with me.
Both reflect my views. Why do you think there is a conflict?
Because the recent comment assumes that one of the relevant consequences of me not writing comments would be relief of victimized people that read my comments, while if we assume that there are also people not included in the group, the consequence of them not benefiting from my comments would balance out the consequence you pointed out, making it filtered evidence and hence not worth mentioning on its own. If you won’t use filtered evidence this way, it follows that your recent comment assumes this non-victimized group to be insignificant, while the earlier comment didn’t. (No rhetorical questions in this thread.)
Since people are often wrong, assuming a particular mistake is not always that much off a hypothesis (given available information), but the person suspected of error will often notice the false positives more saliently than they deserve, instead of making a correction, as a purely technical step, and moving forward.
The observation that people are often wrong applies similarly to both the hypothesis that a specific error is present and the hypothesis that a specific correction is optimal. Expecting a conversation partner to take either of those as given is incorrect in a very similar way to expecting a conversational partner to take a particular hypothesis’s truth as given. Clear communication of the logic behind a hypothesis (including a hypothesis about wrongness or correction) is generally necessary in such situations before that hypothesis is accepted as likely-true.
That’s awfully convenient.
Not really. An AI that didn’t have a specific desire to be friendly to mankind would want to kill us to cut down on unnecessary entropy increases.
As you get closer to the mark, with AGI’s that have utility function that roughly resembles what we would want, but is still wrong, the end results are most likely worse than death. Especially since there should be much more near-misses than exact hits. Like, AGI that doesn’t want to let you die, regardless of what you go through, and little regard to your other sort of well-being, would be closer to the FAI than paperclip maximizer that would just plain kill you. As you get closer to the core of friendliness, you get all sorts of weird AGI’s that want to do something that twistedly resembles something good, but is somehow missing something or is somehow altered so that the end result is not at all what you wanted.
Is this true or is this a useful assumption to protect us from doing something stupid?
Is it true that Friendliness is not an attractor or is it that we cannot count on such a property unless it is absolutely proven to be the case?
My idea there was that if it’s not Friendly, then it’s not Friendly, ergo it is doing something that you would not want an AI to be doing(if you thought faster and knew more and all that). That’s the core of the quote you had there. Random intelligent agent would simply transform us into something of value, so we would most likely die very quickly. However, when you get closer to the Friendliness, Ai is no longer totally indifferent to us, but rather, is maximizing something that could involve living humans. Now, if you take an AI that wants there to be living humans around, but is not known for sure to be Friendly, what could go wrong? My answer, many things, as what humans prefer to be doing is rather complex set of stuff, and even quite little changes could make us really, really unsatisfied with the end result. At least, that’s the idea I’ve gotten from posts here like Value is Fragile.
When you ask if Friendliness is an attractor, do you mean to ask if intelligences near Friendly ones in the design spaces tend to transform into Friendly ones? This seems rather unlikely, as that sort of AI’s most likely are capable of preserving their utility function, and the direction of this transformation is not “natural”. For these reasons, arriving at the Friendliness is not easy, and thus, I’d say you gotta have some sort of a way to ascertain the Friendliness before you can trust it to be just that.
Is this also true if you replace “mankind” with “ants” or “daffodils”?
Ants and daffodils might, by some definitions, have preferences—but it wouldn’t be necessary for a FAI to explicitly consider their preferences, as long as their preferences constitute some part of humanity’s CEV, which seems likely: I think an intact Earth ecosystem would be rather nice to retain, if at all possible.
The entropic contribution of ants and daffodils would doubtless make them candidates for early destruction by a UFAI, if such a step even needed to be explicitly taken alongside destroying humanity.
An AI that that had a botched or badly preserved Friendliness, or that was unfriendly but had been initialized with supergoals involving humans, may well have specific, unpleasant, non-extermination plans for humans.
As in, “I have no mouth and I must scream”.
Imagine an AGI with with the opposite utility function of an FAI, it minimizes the Friendly Utility Function, which would involve doing things far worse than killing us. If you are not putting effort into choosing a utility function, building this AGI seems as likely as building an FAI, as well as lots of other possibilities in the space of AGIs whose utility functions refer to humans, some of which would keep us alive, not all in ways we would appreciate.
The reason I would expect an AGI in this space to be somewhat close to Friendly, is: just hitting the space of utility functions that refer to humans is hard, if it happens it is likely because a human deliberately hit it, and this should indicate that the human has the skill and motivation to optimize further within that space to build an actual Friendly AGI.
If you stipulate that the programmer did not make this effort, and hitting the space of AGIs that keep humans alive only occurred in tiny quantum branches, then you have screened of the argument of a skilled FAI developer, and it seems unlikely that the AGI within this space would be Friendly.
You’ve made a lot of good comments in this thread, but I disagree with this. As likely?
It seems you are assuming that every possible point in AI mind space is equally likely, regardless of history, context, or programmer intent. This is like saying that, if someone writes a routine to sort numbers numerically, it’s just as likely to sort them phonetically.
It seems likely to me that this belief, that the probability distribution over AI mindspace is flat, has become popular on LessWrong, not because there is any logic to support it, but because it makes the Scary Idea even scarier.
Yes, my predictions of what will happen when you don’t put effort into choosing a utility function are inaccurate in the case where you do put effort into choosing a utility function.
Well, lets suppose someone wants a routine to sort numbers numerically, but doesn’t know how to do this, and tries a bunch of stuff without understanding. Conditional on the programmer miraculously achieving some sort of sorting routine, what should we expect about it? Sorting phonetically would add extra complication over sorting numerically, as the information about the names of numbers would have to be embedded within the program, so that would seem less likely. But a routine that sorts numerically ascending is just as likely as a routine that sorts numerically descending, as these routines have a complexity preserving one to one correspondance by interchaning “greater than” with “less than”.
And the utility functions I clamed were equally likely before have the same complexity preserving one to one correspondance.
Would it? Though we do contribute to entropy, things like, say, stars do so at a much faster pace. Admittedly this is logically distinct from the AI’s decision to destroy humanity, but I don’t see why it would immediately jump to the conclusion that we should be wiped out when the main sources of entropy are elsewhere.
More to the point, not all unFriendly AIs would necessarily care about entropy.
It’s kind of a moot question though since shutting off the sun would also be a very effective means of killing people.
For almost any objective an AI had, it could better accomplish it the more free energy the AI had. The AI would likely go after entropy losses from both stars and people. The AI couldn’t afford to wait to kill people until after it had dealt with nearby stars because by then humans would have likely created another AI god.
Assuming that by “AI” you mean something that maximizes a utility function, as opposed to a dumb apocalypse like a grey-goo or energy virus scenario.
I can see how a “dumb apocalypse like a grey-goo or energy virus” would be Artificial, but why would you call it Inteligent?
On this site, unless otherwise specified, AI usually means “at least as smart as a very smart human”.
Yeah, that makes sense. I was going to suggest “smart enough to kill us”, but that’s a pretty low bar.
If we want an AI to be friendly, the thing is to make sure that its utility function includes things that only humans can provide. That way, the AI will have to trade us what we want in order to get what it wants. The possibilities are endless. Give it a taste for romance novels, or cricket, or Jerry Springer. Stand-up comedy, postmodern deconstructionism, or lolcats. Electric power is one intriguing possibility.
The nice thing about having it give us what we want in trade, rather than simply giving us what it was programmed to believe we want, is that we are then permitted to change our minds about what we want, after we have already had a taste of material abundance and immortality. I certainly expect that my values will become revised after a few centuries of that, in ways that I am not yet ready to extrapolate or to have extrapolated for me.
Even supposing an AGI couldn’t figure out how to produce those things itself, I don’t want it to optimize us to produce those things.
Please stop commenting on this topic until you have understood more of what has been written about it on LW and elsewhere. Unsubstantiated proposals harm LW as a community. LW deals with some topics that look crazy on surface examination; you don’t want people who dig deeper to stumble on comments like this and find actual crazy.
You’re kidding. You want us to substantiate all our proposals? Are you giving out grants?
Surely, only grants can save people from generating nonsense without restraint.
Clearly, you are unfamiliar with the controversy regarding the National Endowment for the Arts.
Someone is missing someone’s sarcasm. The first “someone” might be me.
My usual policy when someone says “I was being ironic” is to reply, “Oh, I thought you were feeding me a straight line.”
I hope you enjoy romance-novel-writing slavery.
Sounds like a good plot for a romance novel.
This idea is in fact crazy. However, I share your concerns and believe similar lines of thinking may be fruitful. In particular, I’m not convinced there aren’t ways to secure an AI through clever implementations of its utility function. I made a specific proposal along those lines in this comment.
The whole point of AI, AGI, FAI, etc is that anything we can do, it can do better.
Won’t work.
(Which suggests that this is probably not a good Friendliness strategy in general.)
Like many people, I don’t think this idea will work. But I voted it up, because I vote on comment expected value. On a topic that is critical to solve, and for which there are no good ideas, entertaining crazy ideas is worthwhile. So I’d rather hear one crazy idea that a good Yudkowskian would consider sacrilege, than ten well-reasoned points that are already overrepresented on LessWrong. It’s analogous to the way that optimal mutation rate is high when your current best solution is very sub-optimal, and optimal selection strength (reproduction probability as a function of fitness) is low when your population is nearly homogenous (as ideas about FAI on LessWrong are).
I must admit that I was surprised by just how severely this posting got downvoted. It is always dangerous to mix playfulness with discussion of serious and important issues. My examples of the products of human culture which someone or something might wish to preserve for eternity apparently pushed some buttons here in this community of rationalists.
Back around the year 1800, Napoleon invaded Egypt, carrying in his train a collection of scientific folks that considered themselves version 1.0 rationalists. This contact of enlightenment with antiquity led to a Western fascination with things Egyptian which lasted roughly two centuries before it degenerated into Laura Croft and sharpened razor blades. But it did lead the French, and later the British to disassemble and transport to their own capitals examples of one of the more bizarre aspects of ancient Egyptian monumental architecture. Obelisks.
Of course, we rationalist Americans saw the opportunity to show our superiority over the “old world”. We didn’t steal an authentic ancient Egyptian obelisk to decorate our capital city. We built a newer, bigger, and better one! Yep, we’re Americans. Anything anyone else can do, we can do better. Same applies to our FAIs. They won’t fall into the fallacy of “authenticity”. Show them a romance novel, or a stupid joke, or a schmaltzy photograph and they will build something better themselves. Not bodice rippers, but corset-slicing scalpels. Not moron jokes, but jokes about rocks. Not kittens playing with balls of yarn, but sentient crickets playing baseball.
I cannot be the only person here who thinks there is some value in preserving things simply to preserve them—things like endangered species, human languages, and aspects of human culture. It it really so insane to think that we could instill the same respect-for-the-authentic-but-less-than-perfect in a machine that we create?
We could. But should we? (And how is it even relevant to your original comment? This seems to be a separate argument for roughly the same conclusion. What about the original argument? Do you argree it’s flawed (that is AI can in fact out-native the natives)?)
See also discussion of Waser’s post, in particular second paragraph of my comment here:
I thought I had just made a pretty direct argument that there is one way in which an AI cannot out-native the natives—authenticity. Sorry if it was less than clear.
I have no idea which second paragraph you refer to. May I suggest that you remove all reference to Waser and simply say what you wish to say about what I wrote.
You don’t want to elevate not optimizing something too much as a goal (and it’s difficult to say what that would mean), while just working on optimizing the top-level goal unpacks this impulse as appropriate. Authenticity could be an instrumental goal, but is of little relevance when we discuss values or decision-making in sufficiently general context (i.e. not specifically the environments where we have revealed preference for authenticity despite it not being a component of top-level goal).
I’m sorry. I don’t understand what you just wrote. At all.
For example, do I parse it as “to elevate not optimizing something too much” or as “don’t want … too much”. And what impulse is “this impulse”?
Second paragraph of your comment or of Waser’s?
ETA: If you can clarify, I’ll just delete this comment.
There is valid intuition (“impulse”) that in certain contexts, some sub-goals, such as “replace old buildings with better new ones” shouldn’t be given too much power, as that would lead to bad consequences according to other aspects of their evaluation (e.g. we lose an architectural masterpiece).
To unpack, or cash out an intuition means to create a more explicit model of the reasons behind its validity (to the extent it’s valid). Modeling the above intuition as “optimizing too strongly is undesirable” is incorrect, and so one shouldn’t embrace this principle of not optimizing things too much with high-priority (“elevate”).
Instead, just trying to figure out what top-level goal asks for, and optimizing for the overall top-level goal without ever forgetting what it is, is the way to go. Acting exclusively for top-level goal explains the intuition as well: if you optimize a given sub-goal too much, it probably indicates that you forgot the overall goal, working on something different instead, and that shouldn’t be done.
Conflicts between subgoals indicate premature fixation on alternative solutions. The alternatives shouldn’t be prioritized as goals in and of themselves. The other aspects of their evaluation would fit better as goals or subgoals to be optimized. A goal should give you guidance for choosing between alternatives.
In your example, one might ask what goal can one optimize to help make good decisions between policies like “replace old buildings with better ones” and “don’t lose architectural masterpieces”?
I am puzzled by many things here. One is how we two managed to make this thread so incoherent. A second is just what all this talk of sub-goals and “replacing old buildings with better new ones” and over-optimization has to do with anything I wrote.
I thought that I was discussing the idea of instilling top level values into an AI that would be analogous to those human values which lead us to value the preservation of biological diversity and human cultural diversity. The values which cause us to create museums. The values which lead us to send anthropologists out to learn something about primitive cultures.
The concept of over-optimizing never entered my mind. I know of no downside to over-optimizing other than a possible waste of cognitive resources. If optimizing leads to bad things it it because we are optimizing on the wrong values rather than optimizing too much on the good ones.
ETA: Ah. I get it now. My phrase “respect for the authentic but less than perfect”. You saw it as an intuition in favor of not “overdoing” the optimizing. Believe me. It wasn’t.
What a comedy of errors. May I suggest that we delete this entire conversation?
If you keep stuff in a museum, instead of using its atoms for something else, you are in effect avoiding optimization of that stuff. There could be a valid reason for that (the stuff in the museum remaining where it is happens to be optimal in context), or a wrong one (preserving stuff is valuable in itself).
One idea similar to what I guess you are talking about which I believe to hold some water is sympathy/altruism. If human values are such that we value well-being of sufficiently human-like persons, then any such person will receive a comparatively huge chunk of resources from a rich human-valued agent, compared to what it’d get only for game-theoretic reasons (where one option is to get disassembled if you are weak), for use according to their own values that are different from our agent’s. This possibly could be made real, although it’s rather sketchy at this point.
Meta:
Of the events I did understand, there was one miscommunication, my fault for not making my reference clearer. It’s now edited out. Other questions are still open.
I can’t believe what I don’t understand.
And I should stop responding to comments that I don’t understand. Sorry we wasted each other’s time here.
Talking more generally improves understanding.
I find that listening often works better. But it depends on whom you listen to.
If conversation stops, there is nothing more to listen to. If conversation continues, even inefficient communication eventually succeeds.
Ok, lets have the meta-discussion.
You and I have had several conversations and each time I formed the impression that you were not making enough effort to explain yourself. You are apparently a very smart person, and you seem to think that this means that you are a good communicator. It does not. In my opinion, you are one of the worst communicators here. You tend to be terse to the point of incomprehensibility. You tend to seize upon interpretations of what other people say that can be both bizarre and unshakable. Conversing with you is simply no fun.
Ok. Your turn.
You said about Vladimir:
That’s quite interesting. I rarely have an issue understanding Vladimir. And when I do, a few minutes of thought generally allows me to reconstruct what he is saying. On the other hand, I seem to find you to be a poor communicator not in communicating your ideas but in understanding what other people are trying to say. So I have to wonder how much of this is on your end rather than his end. Moreover, even if that’s the not situation, it would seem highly probable to me that some people will have naturally different styles and modes of communication, and will perceive people who use similar modes as being good communicators and perceive people who use very different modes as being poor communicators. So it may simply be that Vladimir and I are of similar modes and you are of a different mode. I’m not completely sure how to test this sort of hypothesis. If it is correct, I’d expect LWians to clump with opinions about how good various people are at communicating. But that could happen for other reasons as well such as social reasons. So it might be better to test whether given anonymized prose from different LWians whether that shows LWians clumping in their evaluations.
Thank you for this feedback. I had expected to receive something of the sort from VN, but if it was encoded in his last paragraph, I have yet to decypher it.
It certainly felt like at least some of the problem was on my end yesterday, particularly when AdeleneDawner apparently responded meaningfully to the VN paragraph which I had been unable to parse. The thing is, while I was able to understand her sentences, and how they were responses to VN’s sentences, and hence at least something of what VN apparently meant, I still have no understanding of how any of it is relevant in the context of the conversation VN and I were having.
I was missing some piece of context, which VN was apparently assuming would be common knowledge. It may be because I don’t yet understand the local jargon. I’ve only read maybe 2⁄3 of the sequences and find myself in sympathy with only a fraction of what I have read.
A good observation. My calling Vladimir a poor communicator is an instance of mind-projection. He is not objectively poor at communicating—only poor at communicating with me.
Might be interesting to collect the data and find the clusters. I’m sure it is easiest to communicate with those who are at the least cognitive distance. And still relatively easy at some distance as long as you can accurately locate your interlocutor in cognitive space. The problems usually arise when both parties are confused about where the other is “coming from”. But do not notice that they are confused. Or do not announce that they have noticed.
I generally agree with this characterization (except for self-deception part). I’m a bad writer, somewhat terse and annoying, and I don’t like the sound of my own more substantive writings (such as blog posts). I compensate by striving to understand what I’m talking about, so that further detail or clarification can be generally called up, accumulated across multiple comments, or, as is the case for this particular comment, dumped in redundant quantity without regard for resulting style. I like practicing “hyper-analytical” conversation, and would like more people to do that, although I understand that most people won’t like that. I’m worse than average (on my level) at quickly grasping things that are not clearly presented (my intuition is unreliable), but I’m good at systematically settling on correct understanding eventually, discarding previous positions easily, as long as the consciously driven process of figuring out doesn’t terminate prematurely.
Since people are often wrong, assuming a particular mistake is not always that much off a hypothesis (given available information), but the person suspected of error will often notice the false positives more saliently than they deserve, instead of making a correction, as a purely technical step, and moving forward.
Well, that is unquestionably a good thing, and I have no reason to doubt you that you do in fact tend to understand quite a large number of things that you talk about. I wish more people had that trait.
I’m not sure exactly what is meant here. An example (with analysis) might help.
If that is the case, then I misinterpreted this exchange:
Perhaps the reason for my confusion is that it struck me as a premature termination. If you wish to understand something, you should perhaps ask a question, not make a comment of the kind that might be uttered by a Zen master.
Here we go again. …
I don’t understand that comment. Sorry. I don’t understand the context to which it is intended to be applicable, nor how to parse it. There are apparently two people involved in the scenario being discussed, but I don’t understand who does what, who makes what mistake, nor who should make a correction and move forward.
You are welcome to clarify, but quite frankly I am coming to believe that it is just not worth it.
I basically mean permitting unpacking of any concept, including the “obvious” and “any good person would know that” and “are you mad?” ones, and staying on a specific topic even if the previous one was much more important in context, or if there are seductive formalizations that nonetheless have little to do with the original informally-referred-to concepts. See for example here.
I simply meant that I don’t understand what you referred to in your suggestion to believe something. You said that “[It’s not] an intuition in favor of not “overdoing” the optimizing”, but I’m not sure what it is then, and whether on further look it’ll turn out to be what I’d refer to with the same words. Finally, I won’t believe something just because you say I should, a better alternative to discussing your past beliefs (which I don’t have further access to and so can’t form much better understanding of) would be to start discussing statements (not necessarily beliefs!) you name at present.
Consider person K. That person K happens to be wrong on any given topic won’t be shocking. People are often wrong. When person K saying something confusing, trying to explain the confusingness of that statement by person K being wrong is not a bad hypothesis, even if the other possibility is that what K said was not expressed clearly, and can be amended. When person V says to person K “I think you’re wrong”, and it turns out that person K was not wrong in this particular situation, that constitutes a “false positive”: V decided that K is wrong, but it’s not the case. In the aftermath, K will remember V being wrong on this count as a personal attack, and will focus too much on pointing out how wrong it was to assume K’s wrongness when in fact it’s V who can’t understand anything K is saying. Instead, K could’ve just stated a clarifying statement that falsifies V’s hypothesis, so that the conversation would go on efficiently, without undue notice to the hypothesis of being wrong.
(You see why I’m trying to be succinct: writing it up in more detail is too long and no fun. I’ve been busy for the last days, and replied to other comments that felt less like work, but not this one.)
K says A meaning X. V thinks A means Y. V disagrees with Y.
So if V says “If by ‘A’ you mean ‘Y’, then I have to disagree,” then every thing is fine. K corrects the misconception and they both move on. On the other hand, if V says “I disagree with ‘Y’”, things become confused, because K never said ‘Y’. If V says “I disagree with ‘A’, things become even more confused. K has been given no clue of the existence of the misinterpretation ‘Y’ - reconstructing it from the reasons V offers for disputing ‘A’ will take a lot of work.
But if V likes to be succinct, he may simply reply “I disagree” to a long comment and then (succinctly) provide reasons. Then K is left with the hopeless task of deciding whether V is disagreeing with ‘A’, ‘B’, or ‘C’ - all of which statements were made in the original posting. The task is hopeless, because the disagreement is with ‘Y’ and neither party has even mentioned ‘Y’.
I believe that AdeleneDawner makes the same point.
I suspect that you would find yourself with even less tedious work to do if you refrained from making cryptic comments in the first place. That way, neither you nor your victims has to work at transforming what you write into something that can be understood.
I like commenting the way I do, it’s not tedious.
Since some people will be able to understand what I wrote, even when it’s not the person I reply to, some amount of good can come out of it. Also, the general policy of ignoring everything I write allows to avoid the harm completely.
As a meta remark, your attitude expressed in the parent comment seems to be in conflict with attitude expressed in this comment. Which one more accurately reflects your views? Have they changed since then? From the past comment:
Both reflect my views. Why do you think there is a conflict? I wrote:
It seems to me that this advice is good, even if you choose to operationalize the word ‘cryptic’ to mean ‘comments directed at Perplexed’.
Writing not tedious, so advice not good.
Because the recent comment assumes that one of the relevant consequences of me not writing comments would be relief of victimized people that read my comments, while if we assume that there are also people not included in the group, the consequence of them not benefiting from my comments would balance out the consequence you pointed out, making it filtered evidence and hence not worth mentioning on its own. If you won’t use filtered evidence this way, it follows that your recent comment assumes this non-victimized group to be insignificant, while the earlier comment didn’t. (No rhetorical questions in this thread.)
The observation that people are often wrong applies similarly to both the hypothesis that a specific error is present and the hypothesis that a specific correction is optimal. Expecting a conversation partner to take either of those as given is incorrect in a very similar way to expecting a conversational partner to take a particular hypothesis’s truth as given. Clear communication of the logic behind a hypothesis (including a hypothesis about wrongness or correction) is generally necessary in such situations before that hypothesis is accepted as likely-true.