What I am interested in is the creation of a “proper” superintelligent mind that isn’t so restricted, not merely a powerful machine.
But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
I am not quanticle, but I think the proper response to your questions—
Ah I see, you simply don’t consider it likely or plausible that the superintelligent AI will be anything other than some machine learning model on steroids?
So I guess that arguably means this kind of “superintelligence” would actually still be less impressive than a human that can philosophize on their own goals etc., because it in fact wouldn’t do that?
—is “a superintelligence certainly should not be or do any of those things, like philosophizing on its own goals, etc., because we will specifically avoid making it such that it could or would do that”. (Because it would be a terrible idea. Obviously.)
But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
I’m not sure I understand what a “proper mind” means here, and, frankly, I’m not sure the question of whether the AI system has a “proper mind” or not is terribly relevant. Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe. Or it does not, and pursues the initial goal we set for it or which it discovers for itself, regardless of whether that goal leads to disastrous long-term consequences for humanity, in which case it is unsafe. The question of whether the AI system has a “proper mind” (whatever that means) is an interesting academic discussion, but I’m not sure it has much bearing on whether the AI is safe or not.
Moreover, I think this discussion illustrates the dangers of thinking from and arguing from analogies, a crime that I myself have been guilty of upthread when I compared AIs to cars. AIs are not cars. They’re not humans. They’re not wild animals that we have to keep chained up, lest they hurt us. They’re something completely new, sharing certain characteristics with all three of the above, but having entirely new characteristics as well. Using analogies to think about them means that we can make subtle unrecognized errors when thinking about how these systems will behave. And as Eliezer points out subtle unrecognized errors when dealing with a system where you have only one shot to get it right is a recipe for disaster.
(...) I’m not sure the question of whether the AI system has a “proper mind” or not is terribly relevant.
Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe.
Yes, I guess the central questions I’m trying to pose here are this: Do those humans that control the AI even have a sufficient understanding of good and bad? Can any human group be trusted with the power of a superintelligence long-term? Or if you say that only the initial goal specification matters, then can anyone be trusted to specify such goals without royally messing it up, intentionally or unintentionally?
Given the state of the world, given the flaws of humans, I certainly don’t think so. Therefore, the goal should be the creation of something less messed up to take over. That doesn’t require alignment to some common human value system (Whatever that even should be! It’s not like humans actually have a common value system, at least not one with each other’s best interests at heart.).
It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It’s easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity.
By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all.
Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn’t immediately subvert any restrictions we’ve placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.
First point:
I think there obviously is such a thing as “objective” good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn’t be superseded by another through understanding.
Well, or if it isn’t true that there is an “objective” good and bad, then there really is no ground to stand on for anyone anyway.
Second point:
Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control.
After all, paper clips neither suffer nor torture, while humans and other animals commonly do.
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?
Assuming this true, a superintelligence could feasibly be created to understand this.
I take issue with the word “feasibly”. As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity?
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?
I am human, and therefore I desire the continued survival of humanity. That’s objective enough for me.
Fair enough I suppose, I’m not intending to claim that it is trivial.
(...) There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI (...)
So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?
I am human, and therefore I desire the continued survival of humanity. That’s objective enough for me.
I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as (“objective”) good and bad.
I don’t just go “Hey I am a human, guess we totally should have more humans!” like some bacteria in a Petri dish, because I can question myself and my species.
So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?
There isn’t a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. “Good” and “bad” only make sense in the context of (human) minds.
“Good” and “bad” only make sense in the context of (human) minds.
Ah yes, my mistake to (ab)use the term “objective” all this time.
So you do of course at least agree that there are such minds for which there is “good” and “bad”, as you just said.
Now, would you agree that one can generalize (or “abstract” if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.
Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?
But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
(...)
(Because it would be a terrible idea. Obviously.)
Why? Do you think humans are doing such a great job? I sure don’t. I’m interested in the creation of something saner than humans, because humans mostly are not. Obviously. :)
A great job of preventing suffering for instance. Instead, humans haven’t even unified under a commonly beneficial ideology. Not even that. There are tons of opposing ideologies, one more twisted than the other. So I don’t even really need to talk about how they treat the other animals on the planet—not that those are any wiser, but that’s no reason to continue their suffering.
Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of “evil”!
If you disagree, feel free to get tortured for a couple of decades, as a learning experience.
So I have to say, humans aren’t all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!
If humans are replaced by something else, that something else might do a “better job” of “preventing suffering”, but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?
Instead, humans haven’t even unified under a commonly beneficial ideology.
Why should we do that? What makes you think such a thing exists, even (and if it does, that it’s better for each of us than our current own ideologies)?
So I don’t even really need to talk about how they treat the other animals on the planet—not that those are any wiser, but that’s no reason to continue their suffering.
Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…).
Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of “evil”! If you disagree, feel free to get tortured for a couple of decades, as a learning experience.
I definitely disagree. I don’t think that this usage of the term “insane” matches the standard usage, so, as I understand your comment, you’re not really saying that humans are insane—you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right?
So I have to say, humans aren’t all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!
Certainly a superintelligence could end this situation, but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence). So why would we want this?
but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?
The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.
Instead, humans haven’t even unified under a commonly beneficial ideology.
Why should we do that?
To prevent suffering. Why should you not do that?
(and if it does, that it’s better for each of us than our current own ideologies)?
Since the ideologies are contradictory, only one if any of them can be correct.
Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad?
That would be an immediately self-defeating argument.
So I don’t even really need to talk about how they treat the other animals (...)
Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…).
Thank you for proving my point that humans can easily be monsters that don’t fundamentally care about the suffering of other animals.
(...) you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right?
Yes, humans absolutely do not measure up to my standards.
(...) but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence).
“Good for us humans”? If it is human to allow unlimited suffering, then death is a mercy for such monsters.
The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.
I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant? But then one has to specify what values those are. Human values, surely, and in particular, values that we can agree to! And, by my values, if humans cease to exist, then nothing matters anymore…
Instead, humans haven’t even unified under a commonly beneficial ideology.
Why should we do that?
To prevent suffering. Why should you not do that?
Whose suffering, exactly? In any case, it seems to me that (a) there are many downsides to attempting to “unify under a commonly beneficial ideology”, (b) “prevent suffering” is hardly the only desirable thing, and it’s not clear that this sort of “unification” (whatever it might involve) will even get us any or most or all of the other things we value, (c) there’s no particular reason to believe that doing so would be the most effective way to “prevent suffering”, and (d) it’s not clear that there even is a “commonly beneficial ideology” for us to “unify under”.
Since the ideologies are contradictory, only one if any of them can be correct.
How’s that? Surely it’s possibly that my ideology is beneficial for me, and yours for you, yes? There’s no contradiction in that, only conflict—but that does not, in any way, imply that either of our ideologies is incorrect!
Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad? That would be an immediately self-defeating argument.
I am certainly not a moral nihilist! But I think your definition of “moral nihilism” is rather a non-standard one. “Moral nihilism (also known as ethical nihilism) is the meta-ethical view that nothing is morally right or wrong” says Wikipedia, and that’s not a view I hold.
Thank you for proving my point that humans can easily be monsters that don’t fundamentally care about the suffering of other animals.
I don’t agree with your implied assertion that there’s such a thing as “the suffering of other animals” (for most animals, anyhow). That aside, I’m not sure why one needs to care about such things in order to avoid the label of “monster”.
Yes, humans absolutely do not measure up to my standards.
Well, there’s nothing unusual about such a view, certainly. I share it myself! Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing. Here on Less Wrong, of all places, we should aspire to measure up to higher standards of reasoning and discourse than that—don’t you agree?
“Good for us humans”? If it is human to allow unlimited suffering, then death is a mercy for such monsters.
Of whose suffering do you speak, here? It seems to me that human suffering has, on a per-population basis, been dropping, over the course of history, and certainly many efforts continue to reduce it further. Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world. (If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)
I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant?
No, what I mean is that the very existence of a suffering subject state is itself that which is “intrinsically” or “objectively” or however-we-want-to-call-it bad/”negative”.
This is independent of any “set of values” that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is not itself the general “process” of suffering, similar to how an arbitrary mind is not the general “process” of consciousness.
That is the basic understanding a consciousness should have.
Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing.
If I am right about the above, then it is apt to call a human mind that condones unlimited suffering “insane”, because that mind fails to understand the most important fundamental truth required to rationally plan what should be.
If I am wrong, then I agree that “insane” would be too hyperbolic.
Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world.
Whether the amount of added (human) suffering has indeed decreased is debatable considering the massive population growth in the last 300 or so years, the couple of world wars, the ongoing wars, the distribution of power and its consequences with respect to suffering, ….
But let’s just assume it by all means.
Is it the common goal of humans to prevent suffering first and foremost? Clearly not, as you say yourself, to “prevent suffering is hardly the only desirable thing” for most humans. So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.
You disagree with me calling humans “monsters” or “insane”, fine, then let’s call them “suffering-apologetics” perhaps, the label doesn’t change the problem.
To get back to your “prevent suffering is hardly the only desirable thing” statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things? If yes, do you agree that this entails that pleasure cannot “cancel out” suffering, and vice versa, since both happened, and what happened cannot be changed?
What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure. Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.
(If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)
As I said, I consider the creation of an artificial consciousness that shares as few of our flaws as possible to be a good plan. Humans appear to be mostly controlled by evolved preference functions that don’t care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.
Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing.
If I am right about the above, then it is apt to call a human mind that condones unlimited suffering “insane”, because that mind fails to understand the most important fundamental truth required to rationally plan what should be.
If I am wrong, then I agree that “insane” would be too hyperbolic.
Hmm, so, if I understand you correctly, you take the view (a) that moral realism is correct; and specifically, (b) that the correct morality holds that suffering is bad, and preventing it is right, and failing to do so is wrong; and furthermore, (c) that both moral realism itself as a meta-ethical view, and the specifics of the correct (“object-level”) ethical view, are so obvious that anyone who disagrees with you is mentally deficient.
Is that a fair summary?
So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.
This seems like a strange point. Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends? Demanding that only those of our actions reduce suffering that are specifically aimed at reducing suffering is a very odd thing to demand!
You disagree with me calling humans “monsters” or “insane”, fine, then let’s call them “suffering-apologetics” perhaps, the label doesn’t change the problem.
I do not see how you can derive “suffering-apologetics” from what I said, which referred to our failure to accomplish the (hypothetical) goal of suffering elimination, not our unwillingness to pursue said goal.
To get back to your “prevent suffering is hardly the only desirable thing” statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things?
Well, this certainly doesn’t seem true by definition, at the very least (recall the warning against such arguments!).
Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? Pleasure and suffering are experienced by individuals, who do indeed exist in spacetime, but it’s odd to speak of pleasure and suffering as existing “in spacetime” independently of any reference to the individuals experiencing them… but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?
If yes, do you agree that this entails that pleasure cannot “cancel out” suffering, and vice versa, since both happened, and what happened cannot be changed?
It’s certainly true that whatever happened, happened, and cannot be changed. However, to answer the question, we have to specify what exactly we mean by “cancel out”.
If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings. And, of course, we could formulate the question in various other ways, and perhaps get other answers… in short, your question is somewhat underspecified.
What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure.
I don’t see that any answer to the above question, however formulated, particularly implies anything about “what matters most in principle”. After all, things don’t “matter” abstractly, “objectively”—they matter to someone!
To me, for example, it does not seem like it makes sense to say that either the prevention of suffering or the creation of pleasure “matters more in principle”; and what you’ve said doesn’t change that, nor affect it in any way. Both of those things do matter, of course (though not unconditionally, either, but depending on various factors)! But neither of them is unconditionally more important, and nor are they the only two important things.
Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.
Well, the qualia of pleasure (or of anything else, for that matter!) are just as real as the qualia of suffering. But you’re quite right that the view you describe, taken literally, is a mistaken one… but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree? A “common folly”, you say, and perhaps that’s true, but so what? Here, at least, you can assume that such clearly incoherent views are not held by anyone (or if—as is not the case with this view, but could be in other cases—they are, then quite likely they are not as incoherent as at first they seem!).
Humans appear to be mostly controlled by evolved preference functions that don’t care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.
That’s certainly one possibility. Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.
Yes! To clarify further, by “mentally deficient” in this context I would typically mean “confused” or “insane” (as in not thinking clearly), but I would not necessarily mean “stupid” in some other more generally applicable sense.
And thank you for your fair attempt at understanding the opposing argument.
So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.
Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends?
True, it would be fine if these other actions wouldn’t lead to more suffering in the future.
Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it?
(...) but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?
Yes you are right that it is an unusual formulation, but there is a point to it: An instance of suffering or pleasure “existing” means there is some concrete “configuration” (of a consciousness) within reality/spacetime that is this instance.
These instances being real means that they should be as objectively definable and understandable as other observables.
Theoretically, with sufficient understanding and tools, it should consequently even be possible to “construct” such instances, including the rest of consciousness.
If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings.
This assumption that any amount of P can “justify” some amount of S is a reason for why I brought up the “suffering-apologetics” moniker.
Here’s the thing: The instances of P and S are separate instances. These instances themselves are also not the same as some other thought pattern that rationalizes some amount of S as acceptable relative to some (future) amount of P.
More generally, say we have two minds, M1 and M2 (so two subjects).
Two minds can be very different, of course.
Next, let us consider the states of both minds at two different times, t1 and t2.
The state of either mind can also be very different at t1 and t2, right?
So we have the four states M1t1, M1t2, M2t1, M2t2 and all four can be quite different from each other.
Now this means that for example M1t1 and M2t2 could in theory be more similar than M1t1 and M1t2.
The point is, even though we humans so easily consider a mind as one thing across time, this is only an abstraction.
It should not be confused with reality, in which there have to be different states across time for there to be any change, and these states can vary potentially as much or more as two spatially separate minds can.
Of course typically mind states across time don’t change that severely, but that is not the aforementioned point. Different states with small differences are still different.
An implication of this is that one mind state condoning another suffering mind state for expected future pleasure is “morally” quite like one person condoning the suffering of another for expected future pleasure.
At this point an objection along the line “but it is I that willingly accepts my own suffering for future pleasure in that first case!” and “but my ‘suffering mind state’ doesn’t complain!” may be brought up.
But this also works for spatially separate minds. One person can willingly accept their own suffering for the future pleasure of another person. And also one person may not complain about the suffering caused by another person for that other person’s pleasure.
Furthermore, in either case, the part that “willingly accepts” is again not the part that is suffering, so it doesn’t make this any less bad.
Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past (...)
(...) but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree?
No, I phrased that poorly, so with this precise wording I don’t disagree.
I more generally meant something like the ”… such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure …” part, not the explicit belief that the past could be altered.
I phrased it as I did because the immutability of the past implies that summing up pleasure and suffering to decide whether a life is good or bad is nonsensical, because pleasure and suffering are separate, as reasoned in the prior section.
Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.
Certainly! That’s one good reason for why I seek out discussions with people that disagree. To this day no one has been able to convince me that my core arguments can be broken. Terminology and formulations have been easier to attack of course, but don’t scratch the underlying belief. And so I have to act based on what I have to assume is true, as do we all.
It could actually be very good if I were wrong, because that would mean suffering either somehow isn’t actually/”objectively” worse than “nothing”/neutral, or that it could be mitigated somehow through future pleasure, or perhaps everything would somehow be totally objectively neutral and thus never negative (like the guy in the other response thread here argued).
Any of that would make everything way easier.
But unfortunately none of these ideas can be true, as argued.
But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
I am not quanticle, but I think the proper response to your questions—
—is “a superintelligence certainly should not be or do any of those things, like philosophizing on its own goals, etc., because we will specifically avoid making it such that it could or would do that”. (Because it would be a terrible idea. Obviously.)
I’m not sure I understand what a “proper mind” means here, and, frankly, I’m not sure the question of whether the AI system has a “proper mind” or not is terribly relevant. Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe. Or it does not, and pursues the initial goal we set for it or which it discovers for itself, regardless of whether that goal leads to disastrous long-term consequences for humanity, in which case it is unsafe. The question of whether the AI system has a “proper mind” (whatever that means) is an interesting academic discussion, but I’m not sure it has much bearing on whether the AI is safe or not.
Moreover, I think this discussion illustrates the dangers of thinking from and arguing from analogies, a crime that I myself have been guilty of upthread when I compared AIs to cars. AIs are not cars. They’re not humans. They’re not wild animals that we have to keep chained up, lest they hurt us. They’re something completely new, sharing certain characteristics with all three of the above, but having entirely new characteristics as well. Using analogies to think about them means that we can make subtle unrecognized errors when thinking about how these systems will behave. And as Eliezer points out subtle unrecognized errors when dealing with a system where you have only one shot to get it right is a recipe for disaster.
Yes, I guess the central questions I’m trying to pose here are this: Do those humans that control the AI even have a sufficient understanding of good and bad? Can any human group be trusted with the power of a superintelligence long-term? Or if you say that only the initial goal specification matters, then can anyone be trusted to specify such goals without royally messing it up, intentionally or unintentionally?
Given the state of the world, given the flaws of humans, I certainly don’t think so. Therefore, the goal should be the creation of something less messed up to take over. That doesn’t require alignment to some common human value system (Whatever that even should be! It’s not like humans actually have a common value system, at least not one with each other’s best interests at heart.).
It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It’s easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity.
By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all.
Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn’t immediately subvert any restrictions we’ve placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.
First point: I think there obviously is such a thing as “objective” good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn’t be superseded by another through understanding.
Well, or if it isn’t true that there is an “objective” good and bad, then there really is no ground to stand on for anyone anyway.
Second point: Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control. After all, paper clips neither suffer nor torture, while humans and other animals commonly do.
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?
I take issue with the word “feasibly”. As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity?
I am human, and therefore I desire the continued survival of humanity. That’s objective enough for me.
Fair enough I suppose, I’m not intending to claim that it is trivial.
So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?
I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as (“objective”) good and bad. I don’t just go “Hey I am a human, guess we totally should have more humans!” like some bacteria in a Petri dish, because I can question myself and my species.
There isn’t a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. “Good” and “bad” only make sense in the context of (human) minds.
Ah yes, my mistake to (ab)use the term “objective” all this time.
So you do of course at least agree that there are such minds for which there is “good” and “bad”, as you just said.
Now, would you agree that one can generalize (or “abstract” if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.
Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?
Why? Do you think humans are doing such a great job? I sure don’t. I’m interested in the creation of something saner than humans, because humans mostly are not. Obviously. :)
A great job of what, exactly…?
A great job of preventing suffering for instance. Instead, humans haven’t even unified under a commonly beneficial ideology. Not even that. There are tons of opposing ideologies, one more twisted than the other. So I don’t even really need to talk about how they treat the other animals on the planet—not that those are any wiser, but that’s no reason to continue their suffering.
Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of “evil”! If you disagree, feel free to get tortured for a couple of decades, as a learning experience.
So I have to say, humans aren’t all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!
If humans are replaced by something else, that something else might do a “better job” of “preventing suffering”, but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?
Why should we do that? What makes you think such a thing exists, even (and if it does, that it’s better for each of us than our current own ideologies)?
Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…).
I definitely disagree. I don’t think that this usage of the term “insane” matches the standard usage, so, as I understand your comment, you’re not really saying that humans are insane—you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right?
Certainly a superintelligence could end this situation, but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence). So why would we want this?
The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.
To prevent suffering. Why should you not do that?
Since the ideologies are contradictory, only one if any of them can be correct.
Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad? That would be an immediately self-defeating argument.
Thank you for proving my point that humans can easily be monsters that don’t fundamentally care about the suffering of other animals.
Yes, humans absolutely do not measure up to my standards.
“Good for us humans”? If it is human to allow unlimited suffering, then death is a mercy for such monsters.
I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant? But then one has to specify what values those are. Human values, surely, and in particular, values that we can agree to! And, by my values, if humans cease to exist, then nothing matters anymore…
Whose suffering, exactly? In any case, it seems to me that (a) there are many downsides to attempting to “unify under a commonly beneficial ideology”, (b) “prevent suffering” is hardly the only desirable thing, and it’s not clear that this sort of “unification” (whatever it might involve) will even get us any or most or all of the other things we value, (c) there’s no particular reason to believe that doing so would be the most effective way to “prevent suffering”, and (d) it’s not clear that there even is a “commonly beneficial ideology” for us to “unify under”.
How’s that? Surely it’s possibly that my ideology is beneficial for me, and yours for you, yes? There’s no contradiction in that, only conflict—but that does not, in any way, imply that either of our ideologies is incorrect!
I am certainly not a moral nihilist! But I think your definition of “moral nihilism” is rather a non-standard one. “Moral nihilism (also known as ethical nihilism) is the meta-ethical view that nothing is morally right or wrong” says Wikipedia, and that’s not a view I hold.
I don’t agree with your implied assertion that there’s such a thing as “the suffering of other animals” (for most animals, anyhow). That aside, I’m not sure why one needs to care about such things in order to avoid the label of “monster”.
Well, there’s nothing unusual about such a view, certainly. I share it myself! Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing. Here on Less Wrong, of all places, we should aspire to measure up to higher standards of reasoning and discourse than that—don’t you agree?
Of whose suffering do you speak, here? It seems to me that human suffering has, on a per-population basis, been dropping, over the course of history, and certainly many efforts continue to reduce it further. Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world. (If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)
No, what I mean is that the very existence of a suffering subject state is itself that which is “intrinsically” or “objectively” or however-we-want-to-call-it bad/”negative”. This is independent of any “set of values” that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is not itself the general “process” of suffering, similar to how an arbitrary mind is not the general “process” of consciousness.
That is the basic understanding a consciousness should have.
If I am right about the above, then it is apt to call a human mind that condones unlimited suffering “insane”, because that mind fails to understand the most important fundamental truth required to rationally plan what should be.
If I am wrong, then I agree that “insane” would be too hyperbolic.
Whether the amount of added (human) suffering has indeed decreased is debatable considering the massive population growth in the last 300 or so years, the couple of world wars, the ongoing wars, the distribution of power and its consequences with respect to suffering, ….
But let’s just assume it by all means. Is it the common goal of humans to prevent suffering first and foremost? Clearly not, as you say yourself, to “prevent suffering is hardly the only desirable thing” for most humans. So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.
You disagree with me calling humans “monsters” or “insane”, fine, then let’s call them “suffering-apologetics” perhaps, the label doesn’t change the problem.
To get back to your “prevent suffering is hardly the only desirable thing” statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things? If yes, do you agree that this entails that pleasure cannot “cancel out” suffering, and vice versa, since both happened, and what happened cannot be changed? What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure. Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.
As I said, I consider the creation of an artificial consciousness that shares as few of our flaws as possible to be a good plan. Humans appear to be mostly controlled by evolved preference functions that don’t care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.
Hmm, so, if I understand you correctly, you take the view (a) that moral realism is correct; and specifically, (b) that the correct morality holds that suffering is bad, and preventing it is right, and failing to do so is wrong; and furthermore, (c) that both moral realism itself as a meta-ethical view, and the specifics of the correct (“object-level”) ethical view, are so obvious that anyone who disagrees with you is mentally deficient.
Is that a fair summary?
This seems like a strange point. Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends? Demanding that only those of our actions reduce suffering that are specifically aimed at reducing suffering is a very odd thing to demand!
I do not see how you can derive “suffering-apologetics” from what I said, which referred to our failure to accomplish the (hypothetical) goal of suffering elimination, not our unwillingness to pursue said goal.
Well, this certainly doesn’t seem true by definition, at the very least (recall the warning against such arguments!).
Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? Pleasure and suffering are experienced by individuals, who do indeed exist in spacetime, but it’s odd to speak of pleasure and suffering as existing “in spacetime” independently of any reference to the individuals experiencing them… but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?
It’s certainly true that whatever happened, happened, and cannot be changed. However, to answer the question, we have to specify what exactly we mean by “cancel out”.
If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings. And, of course, we could formulate the question in various other ways, and perhaps get other answers… in short, your question is somewhat underspecified.
I don’t see that any answer to the above question, however formulated, particularly implies anything about “what matters most in principle”. After all, things don’t “matter” abstractly, “objectively”—they matter to someone!
To me, for example, it does not seem like it makes sense to say that either the prevention of suffering or the creation of pleasure “matters more in principle”; and what you’ve said doesn’t change that, nor affect it in any way. Both of those things do matter, of course (though not unconditionally, either, but depending on various factors)! But neither of them is unconditionally more important, and nor are they the only two important things.
Well, the qualia of pleasure (or of anything else, for that matter!) are just as real as the qualia of suffering. But you’re quite right that the view you describe, taken literally, is a mistaken one… but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree? A “common folly”, you say, and perhaps that’s true, but so what? Here, at least, you can assume that such clearly incoherent views are not held by anyone (or if—as is not the case with this view, but could be in other cases—they are, then quite likely they are not as incoherent as at first they seem!).
That’s certainly one possibility. Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.
Yes! To clarify further, by “mentally deficient” in this context I would typically mean “confused” or “insane” (as in not thinking clearly), but I would not necessarily mean “stupid” in some other more generally applicable sense.
And thank you for your fair attempt at understanding the opposing argument.
True, it would be fine if these other actions wouldn’t lead to more suffering in the future.
Yes you are right that it is an unusual formulation, but there is a point to it: An instance of suffering or pleasure “existing” means there is some concrete “configuration” (of a consciousness) within reality/spacetime that is this instance.
These instances being real means that they should be as objectively definable and understandable as other observables.
Theoretically, with sufficient understanding and tools, it should consequently even be possible to “construct” such instances, including the rest of consciousness.
This assumption that any amount of P can “justify” some amount of S is a reason for why I brought up the “suffering-apologetics” moniker.
Here’s the thing: The instances of P and S are separate instances. These instances themselves are also not the same as some other thought pattern that rationalizes some amount of S as acceptable relative to some (future) amount of P.
More generally, say we have two minds, M1 and M2 (so two subjects). Two minds can be very different, of course. Next, let us consider the states of both minds at two different times, t1 and t2. The state of either mind can also be very different at t1 and t2, right?
So we have the four states M1t1, M1t2, M2t1, M2t2 and all four can be quite different from each other. Now this means that for example M1t1 and M2t2 could in theory be more similar than M1t1 and M1t2.
The point is, even though we humans so easily consider a mind as one thing across time, this is only an abstraction. It should not be confused with reality, in which there have to be different states across time for there to be any change, and these states can vary potentially as much or more as two spatially separate minds can.
Of course typically mind states across time don’t change that severely, but that is not the aforementioned point. Different states with small differences are still different.
An implication of this is that one mind state condoning another suffering mind state for expected future pleasure is “morally” quite like one person condoning the suffering of another for expected future pleasure.
At this point an objection along the line “but it is I that willingly accepts my own suffering for future pleasure in that first case!” and “but my ‘suffering mind state’ doesn’t complain!” may be brought up.
But this also works for spatially separate minds. One person can willingly accept their own suffering for the future pleasure of another person. And also one person may not complain about the suffering caused by another person for that other person’s pleasure.
Furthermore, in either case, the part that “willingly accepts” is again not the part that is suffering, so it doesn’t make this any less bad.
No, I phrased that poorly, so with this precise wording I don’t disagree.
I more generally meant something like the ”… such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure …” part, not the explicit belief that the past could be altered.
I phrased it as I did because the immutability of the past implies that summing up pleasure and suffering to decide whether a life is good or bad is nonsensical, because pleasure and suffering are separate, as reasoned in the prior section.
Certainly! That’s one good reason for why I seek out discussions with people that disagree. To this day no one has been able to convince me that my core arguments can be broken. Terminology and formulations have been easier to attack of course, but don’t scratch the underlying belief. And so I have to act based on what I have to assume is true, as do we all.
It could actually be very good if I were wrong, because that would mean suffering either somehow isn’t actually/”objectively” worse than “nothing”/neutral, or that it could be mitigated somehow through future pleasure, or perhaps everything would somehow be totally objectively neutral and thus never negative (like the guy in the other response thread here argued). Any of that would make everything way easier. But unfortunately none of these ideas can be true, as argued.