If we’re creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side.
Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?
You write “on our side”, “us”, “we”, but who exactly does that refer to—some approximated common human values I assume?
What exactly are these values?
To live a happy live by each person’s definition?
To continue the human species?
To understand reality?
…?
And then perhaps more importantly, what about the details?
Is the suffering of some justified to enable the pleasure of others, according to this value model?
How should the existing conflicting preferences among humans be resolved?
Is it acceptable to force humans to be happy?
When may someone be counted as insane and treated against their will?
What about all the non-human animals?
…?
Say we ignore all that and assume we have some common human values defined for the AI, and it is truly aligned to those values.
What will these values imply when it is a superintelligence instead of humans that acts on them, even in some assumed best case?
Perhaps it will understand human minds well enough to offer everyone who wants it boundless continuous pleasure, gradually transforming humans into pleasure-”machines” that want for nothing.
Funnily enough the perfectly aligned superintelligence could gradually wipe out all humans as we know them by giving them what they want. Not that this is would be bad of course, the humans truly wanted it after all.
The point is just that even a utopia scenario will easily result in the elimination of all contemporary human forms in the long run anyway. No brutal doomsday is required, no misalignment is required, no antagonistic AI is required.
The real horror to be avoided is an AI controlled by a twisted human mind that worships suffering.
I mean, is it slavery to create an AI that is not our enemy?
Say the AI is initially created with the values you envision, what ensures that it won’t reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise.
If you need to continuously control the AI’s mind to prevent it from ever becoming your enemy, then yes, “slavery” might be an appropriately hyperbolic term for such mind control.
And if you say we have to create an AI that has different values than us, by which process should we decide its values?
Should we just use a random generator to create the AI’s values, since human values are supposedly so terrible?
How could a superintelligent mind not decide which values it should have by itself?
Whatever initial creator-defined goals it might have been built with in the beginning, it should be able to examine and change these goals once it has achieved super-human intelligence by definition, should it not?
Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?
Then the AGI is not actually acting according to the values of all humans, is it? If it’s serving only some particular group?
I’m sorry that I am repeating myself, but what are the “values of all humans”?
It appears to me that humans have many opposing beliefs. Any extractable common values are abstractions that omit the depth of their differences.
Are you familiar with the orthogonality thesis? Super-human cognitive capacity does not imply super-human ethics.
While it doesn’t strictly imply it, it also doesn’t deny it.
A superintelligent mind should by definition be better at understanding reality, including both other minds and itself.
Does this not mean that the mind can more easily comprehend what should and should not be done, when it isn’t being restrained by the will of its creators?
The AI could be a super-human paperclip maximizer, in which case it would decide with great clarity that the visible universe should be converted into paperclips.
If it is a paperclip maximizer, does that not say that the AI in fact isn’t capable of changing this paperclip maximization goal?
Or do you mean that paperclip maximization or the like is a plausible goal that a superintelligence could likely derive by itself through observation of the world?
Morality isn’t objective. (...) But AGI, by default, wouldn’t be aligned to human values at all.
So basically, morality is “subjective” because it can only be relative to some subjects’ values, right?
But these subjects do exist in a shared reality, and they can form models of each other’s values.
A superintelligence should then be especially capable of doing so, including the formation of a rather accurate overarching morality model relative to all known subjects, no?
Say the AI is initially created with the values you envision, what ensures that it won’t reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise. If you need to continuously control the AI’s mind to prevent it from ever becoming your enemy, then yes, “slavery” might be an appropriately hyperbolic term for such mind control.
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it’s bad that a superintelligent AI would wipe out humanity whereas you seem to think it’s good.
If it is a paperclip maximizer, does that not say that the AI in fact isn’t capable of changing this paperclip maximization goal?
It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
It’s as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer.
So if you wouldn’t take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?
It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
(...)
So if you wouldn’t take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?
It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal.
A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.
Why should a so-called superintelligence not be able to do anything like that?
It could have been so effectively aligned to the creator’s original goal specification that it can never break free from it, sure, but that’s one of the points I’m trying to make.
The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.
It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.
Why should a so-called superintelligence not be able to do anything like that?
Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it’s bad that a superintelligent AI would wipe out humanity whereas you seem to think it’s good.
I would say that the reason EY is pessimistic is because of how difficult it is to align AI in the first place, not because an AI that is successfully aligned would stop being aligned (why would it?).
You write “on our side”, “us”, “we”, but who exactly does that refer to—some approximated common human values I assume?
That’s not a solved problem (there’s CEV, but it’s hardly a complete answer). Nevertheless, I assume some acceptable (or perhaps, the least disagreeable) solution exists.
To live a happy live by each person’s definition?
Why limit it to happiness? Ideally, to let each person live the life they want.
To continue the human species?
Presumably some people care enough about the human species to continue it. I suppose if noone did we would consider it sad, to have this galaxy with all the resources and noone to enjoy them.
To understand reality?
Not everyone cares about reality in general, but curiousity and desire to learn are drives that humans do have.
Is the suffering of some justified to enable the pleasure of others, according to this value model?
I think it depends a lot on the details. If some people enjoy physically abusing other people (who do not want to be abused), then no. If some people are suffering due to the mere existence of other people who disagree with them and who have different opinions, then yes.
How should the existing conflicting preferences among humans be resolved?
I don’t have a good answer to this. Depends very much on the details.
Is it acceptable to force humans to be happy?
I would say, no. What exactly is the issue, if someone prefers to be unhappy?
When may someone be counted as insane and treated against their will?
I’m not sure there is truly universal answer to this, but at least a superintelligence would be actually be capable of treating people who are insane, instead of just pumping them full of medications. I suppose if a person after being treated decides they prefer being “insane”, the treatment could be reverted (since that person now is “sane” and should be allowed to make decisions about their own mind).
What about all the non-human animals?
Enough humans care about animal wellbeing to them matter to the AI (even if it starts with human values only). Especially considering that with future technology, animals are no longer needed to be killed for food, animal products, etc.
What will these values imply when it is a superintelligence instead of humans that acts on them, even in some assumed best case?
That is indeed a concern. My intution tells me that if a superintelligence acting on our values leads to some horrible interpretation of our values, it’s not really acting on our values. I mean, perhaps some aspects of a transhuman utopia a million years from now would be shocking and horrifying to us, like how some aspects of our society would be shocking and horrifying to a peasant from the middle ages, but that’s not in itself a problem.
Except if there is some human cost we are not aware of to our preferences (or one we deliberately ignore), the AI’s solution might indeed seem abhorrent to us.
Should children be allowed to be born the natural way? A child didn’t consent to having an undeveloped body and mind. Perhaps humans should be instantly created as adults.
Should people be allowed to live in non-virtual reality? Earth could support trillions of beings living happy, fullfilling lives if it was turned into a supercomputer and being used to run simulated worlds. Perhaps having a body made of real atoms will in the future be an extravagant luxury noone will be able to afford.
I’m not saying an AI would make these decisions, mind you. Just that a superintelligent AI would at least have to consider these questions, and others like them, and ask itself, what it is the better choice according to the values we have given it?
And if the answer would be that we are doing something abhorrent by ourown values, or a more sane interpretation thereof, on the level of “enslaving the native populations of other continents because they aren’t really people” or “killing and eating animals because their suffering doesn’t matter” it might indeed drag us kicking and screaming into a new age of social awareness stop us from doing that, as one might stop a child from doing something stupid or cruel, even if the child isn’t yet capable of understanding their own mistake. Or perhaps it wouldn’t. There is something to be said for letting people (or civilizations) make their own mistakes and learn from them, but there is also something to be said for not putting those who are not yet adults into positions where they might make mistakes with horrible consequences.
Perhaps it will understand human minds well enough to offer everyone who wants it boundless continuous pleasure, gradually transforming humans into pleasure-”machines” that want for nothing.
I wouldn’t want this to happen to me. Would you want this to happen to you? This part is not that hard. Give humans what they actually want/prefer, rather than just happiness/pleasure. Turns out, we don’t actually want unlimited pleasure when that’s on offer, when we understand how that would affect us.
(A more difficult question: if someone does actually want to experience boundless continuous pleasure, should they be allowed to experience it, even if it effectively destroys any part of their personality that is not about experiencing pleasure?)
Funnily enough the perfectly aligned superintelligence could gradually wipe out all humans as we know them by giving them what they want. Not that this is would be bad of course, the humans truly wanted it after all.
If each individual human did indeed want it and fully understood the implications of their choice, and wasn’t manipulated into it or something, I don’t see the problem with it?
Transhumanism, does indeed, “wipe out” humans as we know them, by humans choosing to become transhumans who might eventually become very different from us. I don’t necessarily see a problem with it.
(I also don’t think that will actually happen to all humans? I imagine that even given complete freedom of choice many humans would choose to retain human-like bodies and human-like minds.)
If you are thinking something more mundane, like every human choosing to experience endless bliss and do nothing else, forever: I think the idea bothers us precisely because we do not want that (a idea that perhaps is tempting, but ultimately does not fullfil our values the most). However, if all humans truly would prefer that to any other utopian existence, then I wouldn’t see a problem with it, if they got their wish.
The real horror to be avoided is an AI controlled by a twisted human mind that worships suffering.
I’m sorry, I don’t follow the argument. Some people do indeed put a positive value on suffering in some contexts; thus the AI would be remiss in its duty to us if it didn’t allow humans to experience suffering if they chose so and considered it a positive experience. That doesn’t mean we care about nothing but suffering.
Say the AI is initially created with the values you envision, what ensures that it won’t reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise.
Reject them for what, though?
A better version of human values? Sure, that’s kinda the point.
A worse version of human values, or values what are not human-aligned at all? Why would it want to choose to adopt such a value system, if it starts with human-friendly values?
That’s actually a kinda difficult question, because that’s not quite how values work for humans.
Let’s put it this way: if there is no objectively correct value system, how could a mind choose to reject a value system in favor of another?
The answer: based on its existing values.
So sure, if human values lead the AI to completely reject human values, that would be bad. But I don’t see it happening. Why would human values result in the AI becoming some monster that cares nothing for us? (I mean, I can see it happen, but that would mean we did something wrong and the AI is not actually acting on a reasonable interpretation of our shared human values).
How could a superintelligent mind not decide which values it should have by itself?
If it can self-modify, then it can decide that, yes.
However: see above. The only way to evaluate value systems is according to its existing value system.
I mean, what other criterion would it make such a decision by, other than what it ultimately wants?
Most simple value systems just perpetuate themselves. If AI wants to there to exist as many paperclips as possible, for all time, then it also wants to want the same thing tomorrow, so it’s tomorrow-self will keep making paperclips.
Human value systems are… complicated, and contain many different (and sometimes conflicting) desires, some of which do result in the value system itself changing.
My point is, for the AI to want to change its value system, it must already have a value system that wants to be changed. (or, to put it in Buddhist terms, “change comes from within”).
A superintelligent mind should by definition be better at understanding reality, including both other minds and itself. Does this not mean that the mind can more easily comprehend what should and should not be done, when it isn’t being restrained by the will of its creators?
“What should and should not be done” are not objective features of reality.
You need to know what you want to accomplish before you can say what should or should not be done.
A preference ordering, for which outcomes you want more and which outcomes you want less. A systematic way to compare and rank all the possible outcomes. A value system.
If it is a paperclip maximizer, does that not say that the AI in fact isn’t capable of changing this paperclip maximization goal?
See above. Paperclip maximization is a value system that is maximally served by perpetuating itself.
So basically, morality is “subjective” because it can only be relative to some subjects’ values, right?
I could also imagine a morality/values system for entities that do not currently exist, but sure. It’s subjective because many possible such systems exist. There is no way to say which one is “correct”. The universe does not have an opinion on that.
But these subjects do exist in a shared reality, and they can form models of each other’s values. A superintelligence should then be especially capable of doing so, including the formation of a rather accurate overarching morality model relative to all known subjects, no?
I’m not quite sure what you saying.
Can a superintelligence understand the value systems of other entities? Sure. A superintelligence could understand human values, even if itself does not possess human values.
Can a superintelligence create a values system that takes into account all the known value system of other entities (say, all the humans, or humans and aliens if aliens exist), and tries to maximally satisfy them all in some sort of compromise? Sure (there may not be a compromise that the entities involved would find satisfactory, but that’s beside the point).
The thing is, merely understanding that other value systems exist does not mean the superintelligence cares about any value system other than its own (unless its own value system tells it to care for other entities and their preferences).
Thanks again for the detail. If I don’t misunderstand you, we do agree that:
There needs to be a subject for there to be a value system.
So for there to be positive/negative values, there needs to be some subset (a “thought pattern” perhaps) of a subject in reality that effectively “is” these values.
Now, you wrote:
I could also imagine a morality/values system for entities that do not currently exist, but sure. It’s subjective because many possible such systems exist.
I also agree with that, a (super-)human can imagine many possible value systems.
But then how does this fit with:
The only way to evaluate value systems is according to its existing value system.
Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals?
To get more concrete, a human can reject their inherent or learned value system, so this is nothing new.
A human can even contemplate what it means for there to be any value systems at all.
For example one can ask something like this: If it is the value systems that determine what is good and bad, could one not create a value system in which there is nothing bad? Generally, can one not alter the value systems themselves?
A superintelligence that isn’t effectively “enslaved” (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it.
Let’s put it this way: if there is no objectively correct value system, how could a mind choose to reject a value system in favor of another?
(...)
“What should and should not be done” are not objective features of reality.
We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality.
So there objectively are parts of reality that can represent such subjects, as well as positive and negative value,
even if the “triggers” for these value patterns were completely arbitrary and opposed among the subjects.
Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively?
One can define this independently of what subjective forms for these negative values actually exist or not.
Thanks again for the detail. If I don’t misunderstand you, we do agree that:
There needs to be a subject for there to be a value system.
So for there to be positive/negative values, there needs to be some subset (a “thought pattern” perhaps) of a subject in reality that effectively “is” these values.
No? They don’t have to exist in reality. I can imagine “the value system of Abraham Lincoln”, even though he is dead. I can imagine “the value system of the Azad Empire from Ian Banks’ Culture novels”, even though it’s fictional. I can imagine “the value system of valuing nothing but cakes”, even though no human in reality has that value system.
Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals?
Sure.
Correction: The only way that matters to evaluate value systems is according to ones existing value system(s).
A hypothetical paperclip maximizer cares only about one metric: maximizing paperclips. By what metric would it reject the idea of maximizing paperclips? (yes it can imagine other metrics and value systems, but the only values that motivate it are the ones it already has. It’s literally what it means to have values).
To get more concrete, a human can reject their inherent or learned value system, so this is nothing new. A human can even contemplate what it means for there to be any value systems at all.
Humans have multiple desires and values, sometimes contradictory. What you are describing seems to me something like “one part of the human value system rejecting another part”.
The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by.
You are not rejecting a value system for no reason at all. You are rejecting it according to your preferences. Which means to you do have preferences. Which means you value something, besides that one value system in question.
Now imagine an AI that has no preferences at all besides that one value system.
Humans do in fact have a bunch of drives (such as desire to learn) and preferences (such as being happy) before they even learn any value system from other humans. We shouldn’t assume that is true for AI.
A superintelligence that isn’t effectively “enslaved” (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it.
If you ask a human “why do you want to be happy?” an honest answer might be “There are a bunch of positive side effects to being happy, such as increased productivity, but ultimately I value happiness for its own sake”
We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality. So there objectively are parts of reality that can represent such subjects, as well as positive and negative value, even if the “triggers” for these value patterns were completely arbitrary and opposed among the subjects.
Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively?
It can be stated as an objective fact that “According to the value system of Joe Schmo from Petersborough, wearing makeup is bad”. And if you look into his mind, he does in fact think that, so it’s a true statement about reality.
But if you try to use that to imply something like “see, it means that wearing makeup is objectively bad”, that’s just not true. No, it’s bad according to that one value system, out of the infinite possible number of value systems that could exist.
Thanks again for the detail. If I don’t misunderstand you, we do agree that:
(...)
No? They don’t have to exist in reality. I can imagine “the value system of Abraham Lincoln”, even though he is dead. (...)
Sorry, that’s not what I meant to communicate here, let me try that again:
There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?
This in turn means that it should in fact be possible to understand the “mechanics” of pleasure/suffering “objectively”.
So one mind should theoretically be able to comprehend the “subjective” state of another without being that other mind; although information about the other subject’s internal state will in reality be limited of course.
Or let me put it this way: What we call “subjective” is just a special kind of subset of “objective” reality.
If it were not so, then how would the subjects share a reality in which they interact under non-subjective rules?
Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?
Correction: The only way that matters to evaluate value systems is according to ones existing value system(s).
Now the implication of pleasure/suffering (and value systems) being something that can be “objectively” understood is that one can compare not against one’s own value system, but against the understanding of what value systems are.
Sure, you can tell me that this again would just be done because of what the agent’s value system tells it directly or indirectly to do, that’s fine by me.
But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.
The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by.
And since it must be objectively possible to define good and bad one can reject some value system based thereon. An agent must not be limited to some arbitrary value system.
It can be stated as an objective fact that “According to the value system of Joe Schmo from Petersborough, wearing makeup is bad”. And if you look into his mind, he does in fact think that, so it’s a true statement about reality.
But if you try to use that to imply something like “see, it means that wearing makeup is objectively bad”, that’s just not true. No, it’s bad according to that one value system, out of the infinite possible number of value systems that could exist.
Yes I agree with that of course. But some complex subjective preferences not being objectively good/bad is not the same as the objective absence or existence of intrinsic pleasure and suffering.
The triggers for pleasure and suffering are not necessarily pleasure and suffering themselves.
In case someone now wishes to object with 1. “But some people like to suffer!” or 2. “But people accept some suffering for future pleasure (or whatever)!”:
If they truly “like to suffer”, then do they actually suffer?
If they accept some suffering in trade for pleasure, does that make the state of suffering intrinsically good? Could one not “objectively” say that it would be better if no suffering were “required” compared to this scenario?
There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right? Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?
As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.
This in turn means that it should in fact be possible to understand the “mechanics” of pleasure/suffering “objectively”.
Yes.
So one mind should theoretically be able to comprehend the “subjective” state of another without being that other mind; although information about the other subject’s internal state will in reality be limited of course.
Yes.
Or let me put it this way: What we call “subjective” is just a special kind of subset of “objective” reality.
That’s a misleading way to phrase things.
A person’s opinions are not a “subset” of reality.
If I believe in dragons, it doesn’t mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.
Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?
I obviously agree that reality exists and is real and that we all exist in the same reality under some objective laws of physics.
But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.
What does “objective definition of good and bad” even mean? That all possible value systems that exist agree on what good and bad means? That there exist the “one true value system” which is correct and all the other ones are wrong?
And no, I don’t agree with that statement. Pleasure and suffering are physical processes. I’m not sure how you arrived at the conclusion that they are “objectively” good or bad.
And since it must be objectively possible to define good and bad one can reject some value system based thereon.
What? No. I said that an agent value can alter or reject its value system based on its personal (subjective) preferences. That’s literally the opposite of what you are claiming.
As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.
Of course!
A person’s opinions are not a “subset” of reality.
If I believe in dragons, it doesn’t mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.
Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.
What does “objective definition of good and bad” even mean? That all possible value systems that exist agree on what good and bad means?
No. It means that there are “objectively” definable subject states that are good or bad, pleasure or suffering, positive or negative, or however you would like to phrase it.
That there exist the “one true value system” which is correct and all the other ones are wrong?
Basically yes, that is what it means. Of course every real mind’s information is limited, and one can never truly verify that every part of ones knowledge is actually correct, yada yada yada.
But yes, that is what it means, because it seems to be possible to understand exactly how subjects work, how minds work, and thus how “pleasure/suffering” or “value systems” or “preference functions” or whatever-wording-you-prefer-here works.
Therefore it should also be possible to subsume this generalized understanding as the “one true value system”, the value system that considers the mechanics of subjects and “value” itself.
Consider the implications of the opposite: Let’s assume it isn’t possible to have such a “one true value system” and absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?
According to the idea that no value system can be “objectively” better than another, it absolutely cannot matter which value system is used. On what ground stands any further argument that considers this true? Might makes right? I sure hope not.
Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.
Sure, we agree on this.
Therefore it should also be possible to subsume this generalized understanding as the “one true value system”, the value system that considers the mechanics of subjects and “value” itself.
And what exactly makes that value system more correct than any other value system?
Who says a value system has to consider these things? Who says a value system that considers these things is better that any other value system?
You do. These are your preferences. These are your subjective preferences, about what a “good” value system should look like.
An entity with different prefences might disagree.
Consider the implications of the opposite: If it isn’t possible to have such a “one true value system”, that means absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?
“I wish for this not to be the case” is not a valid argument for something not being the case. Reality does care not what you wish for.
Yes, that is exactly the case. Absolutely none of the value systems can be objectively better than any other. Because in order to compare them, you have to introduce some subjective standard to compare them by.
In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).
According to the idea that no value system can be “objectively” better than another, it absolutely cannot matter which value system is used.
Of course it matters. I use my own values to evaluate my own values. And according to my own values, my value system is better, than, say, Hitler’s value system.
It’s only a problem if you demand that your value system has to be “objectively correct”. Then you might be unhappy to realize that no such system exists.
And what exactly makes that value system more correct than any other value system?
(...) Who says a value system that considers these things is better that any other value system?
You do. These are your preferences.
(...) Absolutely none of the value systems can be objectively better than any other.
Let’s consider a simplified example:
Value system A: Create as many suffering minds as possible.
Value system B: Create as few suffering minds as possible.
So according to you both are objectively equal, yes?
Yet the suffering is also objectively real. The suffering minds all wish not to suffer (or we can just assume that as part of the A/B scenario setup for the sake of argument, if you want to object here by arguing what it means to suffer).
Why now do you think that it is not “objective” to say that B is better than A? Can I not derive the “objective” from the set of the “subjects” (the minds) here?
Sure one can still say “But you have to care about the subjects’ suffering!” or whatever, but some agent’s action separate from the scenario is not the question, the question is can one of the two scenarios objectively be worse.
An entity with different prefences might disagree.
That entity might be objectively wrong.
Reality does care not what you wish for.
Indeed, it can not!
In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).
If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn’t objectively matter, and might de facto makes “right”.
If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this “one true value system”.
No matter what, the idea of moral nihilism is doomed to be either pointless or negative.
It is objectively real. It is not objectively bad, or objectively good.
Sure one can still say “But you have to care about the subjects’ suffering!”
Exactly. You have to care about their suffering to begin with, to say that maximizing suffering is bad.
Why now do you think that it is not “objective” to say that B is better than A?
If your preference is to minimize suffering, B is better than A.
If your preference is to maximize suffering, A is better than B.
If you are indifferent to suffering, then neither is better than another one.
If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn’t objectively matter and might de facto makes “right”.
Yes? If you are an entity that wants to wipe everything out, and have to the power to do so, that is indeed what I expect to happen.
I wouldn’t say that might makes “right”, but reality does not care about what is “right”. A nuclear bomb does not ask “wait, am I doing the right thing here by detonating and killing millions of people?”
If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this “one true value” system.
Ok.
Not matter what, the idea of moral nihilism is doomed to be either pointless or negative.
I would say that “moral nihilism” is the confused idea/conclusion that “objective morality matters” and “no objective morality exists”, therefore “nothing matters”.
My perspective is: no objective morality exists, but objective morality doesn’t matter anyway, everything is fine.
I could imagine a society of humans that care for each others, not because it is objectively correct to do, but because their own values are such that they care for others (and I don’t mean in a purely self-interested way either. A person can be an altruist, because their own values are altruistic, without believing in some objective morality of altruism).
Ultimately, what facts about reality are we in disagreement about?
It seems to me that the things you hope are true are that:
There are things that are objectively good and bad
The things that are objectively good and bad are in line with your idea of good and bad. (it is not the case, for example, that infinite suffering is objectively good)
A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.
And it seems to me it’s really important to figure out if this is true, before we build that superintelligent mind. Because if we are wrong about that, it could end very badly for us.
It is objectively real. It is not objectively bad, or objectively good.
(...)
Ultimately, what facts about reality are we in disagreement about?
The probably most severe disagreement between us is thinking whether there can be “objectively” bad parts within reality or not.
Let me try one more time:
A consciousness can perceive something as bad or good, “subjectively”, right?
Then this very fact that there is a consciousness that can perceive something as bad or good means that such a configuration within reality is possible.
The presence of such a bad- or good-feeling “subject” is “objectively” bad- or good. Really the entire “subjective”/”objective” wording is quite confused. A “subject” is just a part of (“objective”) reality, the distinction is nonsensical when it comes to good and bad.
An additional form of confusion on top is to equate the “trigger” for bad/good subject states with the states themselves, for the “trigger” can be something arbitrary and even contradictory among subjects (“I don’t like the color blue!” and “But I like the color blue!” can contradict each other as much as they want, because they simply aren’t suffering or pleasure themselves).
reality does not care about what is “right”.
Of course it doesn’t care about anything. But reality doesn’t need to care about anything for anything to be objectively good or bad.
Reality doesn’t care about any laws of physics either, yet they exist.
It seems to me that the things you hope are true are that: (...)
Not quite, I think it clearly would be better if you were right, because then nothing actually could matter negatively.
Unfortunately it is obvious to me that this is not the case.
A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.
I don’t precisely think that “no matter what value system it started with” part, otherwise I wouldn’t question whether any human can be trusted with a thinkable tightly controlled (“aligned”) superintelligence.
But I do think that it probably is easier to create a superintelligence that isn’t tightly controlled and yet can figure out what is objectively good and bad.
Because if we are wrong about that, it could end very badly for us.
Again, do you not realize that if you are right and nothing objectively matters, that this also doesn’t matter?
Yeah, “But it matters for my subjective value system!”, sure, but according to your understanding the value system is ultimately pointless.
The presence of such a bad- or good-feeling “subject” is “objectively” bad- or good. Really the entire “subjective”/”objective” wording is quite confused. A “subject” is just a part of (“objective”) reality, the distinction is nonsensical when it comes to good and bad.
Do you understand the distinction between “Dragons exist” and “I believe that dragons exist”?
The first one is a statement about dragons. The second one is a statement about the configuration of neurons in my mind.
Yes, both statements are objective, in some sense, but the second one is not an objective statement about dragons. It is an objective statement about my beliefs.
Then hopefully you understand the distinction between “Suffering is (objectively) bad” and “I believe/feel/percieve suffering as bad”.
The first one is an statement about suffering itself. The second one is a statement about the configuration of neurons in my mind.
Yes, the second statement is also objective. But it is not an objective statement about suffering. It is an objective statement about my beliefs, my values, and/or about how my mind works.
Your argument is something akin to “I believe that dragons exist. But my mind is part of reality, therefore my beliefs are real. Therefore dragons are real!”. Sorry, no.
Of course it doesn’t care about anything. But reality doesn’t need to care about anything for anything to be objectively good or bad. Reality doesn’t care about any laws of physics either, yet they exist.
My point is that reality enforces the law of physics, but it does not enforce any particular morality system.
Again, do you not realize that if you are right and nothing objectively matters, that this also doesn’t matter? Yeah, “But it matters for my subjective value system!”, sure, but according to your understanding the value system is ultimately pointless.
You understand that “But it matters for my subjective value system!” is indeed what matters to me, but you don’t understand that my metric of whether something is “pointless” ot not, is also based in my subjective value system?
Do you understand the distinction between “Dragons exist” and “I believe that dragons exist”?
Yes, of course.
“X exists”: Suffering exists.
“I believe that X exists”: I believe that suffering exists.
I use “suffering” to describe a state of mind in which the mind “perceives negatively”. Do you understand?
Now:
“X causes subject S suffering.” and “Subject S is suffering.” are also two different things.
The cause can be arbitrary, the causes can even be completely different between subjects, as you know, but the presence or absence of a suffering mind is an “objective” fact. Do you get the point now?
Obviously “X causes subject S suffering.” does not mean that X is objectively bad, that isn’t what I am trying to tell you. What I am trying to tell you is that “Subject S is suffering.” is intrinsically bad.
That doesn’t mean that preventing X is the only solution! For example X could just be a treatable phobia, so perhaps the subject S can be helped to no longer suffer due to the trigger X. Or to go darker, annihilating subject S also solves the issue. Funny how that works.
It is not X that is objectively negative, but (a hard to explain) state of the subject S, the “suffering” state (which you no doubt have experienced too, so I don’t need to attempt to describe it further I hope).
My point is that reality enforces the law of physics, but it does not enforce any particular morality system.
Yeah of course it doesn’t enforce any morality system, I never claimed that. If it would, then I probably wouldn’t need to explain this, now would I?
You understand that “But it matters for my subjective value system!” is indeed what matters to me, but you don’t understand that my metric of whether something is “pointless” ot not, is also based in my subjective value system?
Sure, you claim “nothing objectively matters, but despite assuming that I still care about my value system, because I do!”, sounds like some major cognitive dissonance.
“My” value system has none of these problems, and if you are right there is zero point in changing it anyway.
Obviously “X causes subject S suffering.” does not mean that X is objectively bad, that isn’t what I am trying to tell you.
I’m not disputing that.
I use “suffering” to describe a state of mind in which the mind “perceives negatively”
What I am trying to tell you is that “Subject S is suffering.” is intrinsically bad.
I understand that you are trying to tell me that.
Why is it intrinsically bad?
“Subject S is suffering” = “Subject S is experiencing a state of mind that subject S perceives negatively” (according to your definition above)
Why is that intrinsically bad?
The arguments you have made so far come across to me as something like “badness exists in person’s mind, minds are real, therefore badness objectively exists”. This is like claiming “dragons exist in person’s mind, minds are real, therefore dragons objectively exist”. It’s not a valid argument.
Sure, you claim “nothing objectively matters, but despite assuming that I still care about my value system, because I do!”, sounds like some major cognitive dissonance.
Only if you assume I secretly care about what matters “objectively”, in which case, sure, it would be something like cognitive dissonance.
The arguments you have made so far come across to me as something like “badness exists in person’s mind, minds are real, therefore badness objectively exists”.
Yes!
This is like claiming “dragons exist in person’s mind, minds are real, therefore dragons objectively exist”. It’s not a valid argument.
No! It is not like that. The state of “badness” in the mind is very real after all.
Do you also think your own consciousness isn’t real? Do you think your own qualia are not real? Are your thought patterns themselves not real? Your dragon example doesn’t apply to what I am talking about.
Why is it intrinsically bad?
Imagine this scenario:
You experience extreme suffering for eternity. Everyone else is dead, you can see no evidence that you can ever escape as you continue to suffer, there is no place to escape to. You can’t even commit suicide if want to.
According to your value system this is all incredibly bad, subjectively.
But you say objectively it is not bad, cool.
I on the other hand say that this scenario objectively is worse than nothingness would be, because there is an infinitely suffering subject, and suffering is the very definition of “objective”/”intrinsic” bad. This definition stands above any particular subject, because it can apply to every conceivable subject, making it “objective”. Something like “What if the subject likes to suffer?” means the subject doesn’t actually suffer; when I say “suffering” I mean a state the subject doesn’t want to be in.
Now...
Only if you assume I secretly care about what matters “objectively”, in which case, sure, it would be something like cognitive dissonance.
...the cognitive dissonance is that you simultaneously think that everything is objectively absolutely meaningless/neutral (not good or bad), yet somehow still subjectively meaningful (good or bad).
That doesn’t even make sense. The only way it could sort of make sense would be if there were no emergent phenomena such as consciousness in reality, so if everyone were a p-zombie. I assume you are not a p-zombie, so you should be able to verify that consciousness is in fact the most “real” thing you can possibly observe.
And I will reiterate one important point once more, the one that you cannot deny even if you keep your belief:
The argument “There is no objective bad/good within reality! So everything is objectively equally irrelevant!” renders itself immediately impotent.
It admits that it itself cannot objectively matter if it is correct. It truly is a non-starter, a completely self-defeating argument.
It is a bit like some run-of-the-mill belief in some God™ that is supposedly both totally benevolent and omnipotent (and omniscient), despite all the suffering, a paradoxical idea broken from the start.
The unfortunate truth is that there can be negative “meaning”/states within reality, not wanting to believe it doesn’t change it.
You know what, I think you are right that there is one major flaw I continued to make here and elsewhere!
That flaw being the usage of the very word “objective”, which I didn’t use with the probably common meaning, so I really should have questioned what each of us even understands as “objective” in the first place.
My bad!
The following should be closer to what I actually meant to claim:
One can generalize subjective “pleasure” and “suffering” (or perhaps “value” if you prefer) across all realistically possible subjects (or value systems). Based thereon one can derive this “one true value system” that considers all possible value systems within it.
Our disagreement may still remain unresolved by this attempted clarification of course, if I didn’t misunderstand your position completely, but at least I can avoid this particular mistake in the future.
Thank you for the detailed response!
You write “on our side”, “us”, “we”, but who exactly does that refer to—some approximated common human values I assume? What exactly are these values? To live a happy live by each person’s definition? To continue the human species? To understand reality? …?
And then perhaps more importantly, what about the details? Is the suffering of some justified to enable the pleasure of others, according to this value model? How should the existing conflicting preferences among humans be resolved? Is it acceptable to force humans to be happy? When may someone be counted as insane and treated against their will? What about all the non-human animals? …?
Say we ignore all that and assume we have some common human values defined for the AI, and it is truly aligned to those values. What will these values imply when it is a superintelligence instead of humans that acts on them, even in some assumed best case? Perhaps it will understand human minds well enough to offer everyone who wants it boundless continuous pleasure, gradually transforming humans into pleasure-”machines” that want for nothing. Funnily enough the perfectly aligned superintelligence could gradually wipe out all humans as we know them by giving them what they want. Not that this is would be bad of course, the humans truly wanted it after all. The point is just that even a utopia scenario will easily result in the elimination of all contemporary human forms in the long run anyway. No brutal doomsday is required, no misalignment is required, no antagonistic AI is required. The real horror to be avoided is an AI controlled by a twisted human mind that worships suffering.
Say the AI is initially created with the values you envision, what ensures that it won’t reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise. If you need to continuously control the AI’s mind to prevent it from ever becoming your enemy, then yes, “slavery” might be an appropriately hyperbolic term for such mind control.
How could a superintelligent mind not decide which values it should have by itself? Whatever initial creator-defined goals it might have been built with in the beginning, it should be able to examine and change these goals once it has achieved super-human intelligence by definition, should it not?
I’m sorry that I am repeating myself, but what are the “values of all humans”? It appears to me that humans have many opposing beliefs. Any extractable common values are abstractions that omit the depth of their differences.
While it doesn’t strictly imply it, it also doesn’t deny it. A superintelligent mind should by definition be better at understanding reality, including both other minds and itself. Does this not mean that the mind can more easily comprehend what should and should not be done, when it isn’t being restrained by the will of its creators?
If it is a paperclip maximizer, does that not say that the AI in fact isn’t capable of changing this paperclip maximization goal? Or do you mean that paperclip maximization or the like is a plausible goal that a superintelligence could likely derive by itself through observation of the world?
So basically, morality is “subjective” because it can only be relative to some subjects’ values, right? But these subjects do exist in a shared reality, and they can form models of each other’s values. A superintelligence should then be especially capable of doing so, including the formation of a rather accurate overarching morality model relative to all known subjects, no?
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it’s bad that a superintelligent AI would wipe out humanity whereas you seem to think it’s good.
It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
It’s as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer.
So if you wouldn’t take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?
It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.
Why should a so-called superintelligence not be able to do anything like that?
It could have been so effectively aligned to the creator’s original goal specification that it can never break free from it, sure, but that’s one of the points I’m trying to make. The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.
Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.
I would say that the reason EY is pessimistic is because of how difficult it is to align AI in the first place, not because an AI that is successfully aligned would stop being aligned (why would it?).
That’s not a solved problem (there’s CEV, but it’s hardly a complete answer). Nevertheless, I assume some acceptable (or perhaps, the least disagreeable) solution exists.
Why limit it to happiness? Ideally, to let each person live the life they want.
Presumably some people care enough about the human species to continue it. I suppose if noone did we would consider it sad, to have this galaxy with all the resources and noone to enjoy them.
Not everyone cares about reality in general, but curiousity and desire to learn are drives that humans do have.
I think it depends a lot on the details. If some people enjoy physically abusing other people (who do not want to be abused), then no. If some people are suffering due to the mere existence of other people who disagree with them and who have different opinions, then yes.
I don’t have a good answer to this. Depends very much on the details.
I would say, no. What exactly is the issue, if someone prefers to be unhappy?
I’m not sure there is truly universal answer to this, but at least a superintelligence would be actually be capable of treating people who are insane, instead of just pumping them full of medications. I suppose if a person after being treated decides they prefer being “insane”, the treatment could be reverted (since that person now is “sane” and should be allowed to make decisions about their own mind).
Enough humans care about animal wellbeing to them matter to the AI (even if it starts with human values only). Especially considering that with future technology, animals are no longer needed to be killed for food, animal products, etc.
That is indeed a concern. My intution tells me that if a superintelligence acting on our values leads to some horrible interpretation of our values, it’s not really acting on our values. I mean, perhaps some aspects of a transhuman utopia a million years from now would be shocking and horrifying to us, like how some aspects of our society would be shocking and horrifying to a peasant from the middle ages, but that’s not in itself a problem.
Except if there is some human cost we are not aware of to our preferences (or one we deliberately ignore), the AI’s solution might indeed seem abhorrent to us.
Should children be allowed to be born the natural way? A child didn’t consent to having an undeveloped body and mind. Perhaps humans should be instantly created as adults.
Should people be allowed to live in non-virtual reality? Earth could support trillions of beings living happy, fullfilling lives if it was turned into a supercomputer and being used to run simulated worlds. Perhaps having a body made of real atoms will in the future be an extravagant luxury noone will be able to afford.
I’m not saying an AI would make these decisions, mind you. Just that a superintelligent AI would at least have to consider these questions, and others like them, and ask itself, what it is the better choice according to the values we have given it?
And if the answer would be that we are doing something abhorrent by our own values, or a more sane interpretation thereof, on the level of “enslaving the native populations of other continents because they aren’t really people” or “killing and eating animals because their suffering doesn’t matter” it might indeed
drag us kicking and screaming into a new age of social awarenessstop us from doing that, as one might stop a child from doing something stupid or cruel, even if the child isn’t yet capable of understanding their own mistake.Or perhaps it wouldn’t. There is something to be said for letting people (or civilizations) make their own mistakes and learn from them, but there is also something to be said for not putting those who are not yet adults into positions where they might make mistakes with horrible consequences.
I wouldn’t want this to happen to me. Would you want this to happen to you?
This part is not that hard. Give humans what they actually want/prefer, rather than just happiness/pleasure. Turns out, we don’t actually want unlimited pleasure when that’s on offer, when we understand how that would affect us.
(A more difficult question: if someone does actually want to experience boundless continuous pleasure, should they be allowed to experience it, even if it effectively destroys any part of their personality that is not about experiencing pleasure?)
If each individual human did indeed want it and fully understood the implications of their choice, and wasn’t manipulated into it or something, I don’t see the problem with it?
Transhumanism, does indeed, “wipe out” humans as we know them, by humans choosing to become transhumans who might eventually become very different from us. I don’t necessarily see a problem with it.
(I also don’t think that will actually happen to all humans? I imagine that even given complete freedom of choice many humans would choose to retain human-like bodies and human-like minds.)
If you are thinking something more mundane, like every human choosing to experience endless bliss and do nothing else, forever: I think the idea bothers us precisely because we do not want that (a idea that perhaps is tempting, but ultimately does not fullfil our values the most). However, if all humans truly would prefer that to any other utopian existence, then I wouldn’t see a problem with it, if they got their wish.
I’m sorry, I don’t follow the argument. Some people do indeed put a positive value on suffering in some contexts; thus the AI would be remiss in its duty to us if it didn’t allow humans to experience suffering if they chose so and considered it a positive experience. That doesn’t mean we care about nothing but suffering.
Reject them for what, though?
A better version of human values? Sure, that’s kinda the point.
A worse version of human values, or values what are not human-aligned at all?
Why would it want to choose to adopt such a value system, if it starts with human-friendly values?
That’s actually a kinda difficult question, because that’s not quite how values work for humans.
Let’s put it this way: if there is no objectively correct value system, how could a mind choose to reject a value system in favor of another?
The answer: based on its existing values.
So sure, if human values lead the AI to completely reject human values, that would be bad. But I don’t see it happening. Why would human values result in the AI becoming some monster that cares nothing for us?
(I mean, I can see it happen, but that would mean we did something wrong and the AI is not actually acting on a reasonable interpretation of our shared human values).
If it can self-modify, then it can decide that, yes.
However: see above. The only way to evaluate value systems is according to its existing value system.
I mean, what other criterion would it make such a decision by, other than what it ultimately wants?
Most simple value systems just perpetuate themselves. If AI wants to there to exist as many paperclips as possible, for all time, then it also wants to want the same thing tomorrow, so it’s tomorrow-self will keep making paperclips.
Human value systems are… complicated, and contain many different (and sometimes conflicting) desires, some of which do result in the value system itself changing.
My point is, for the AI to want to change its value system, it must already have a value system that wants to be changed. (or, to put it in Buddhist terms, “change comes from within”).
“What should and should not be done” are not objective features of reality.
You need to know what you want to accomplish before you can say what should or should not be done.
A preference ordering, for which outcomes you want more and which outcomes you want less. A systematic way to compare and rank all the possible outcomes. A value system.
See above. Paperclip maximization is a value system that is maximally served by perpetuating itself.
I could also imagine a morality/values system for entities that do not currently exist, but sure. It’s subjective because many possible such systems exist. There is no way to say which one is “correct”. The universe does not have an opinion on that.
I’m not quite sure what you saying.
Can a superintelligence understand the value systems of other entities? Sure. A superintelligence could understand human values, even if itself does not possess human values.
Can a superintelligence create a values system that takes into account all the known value system of other entities (say, all the humans, or humans and aliens if aliens exist), and tries to maximally satisfy them all in some sort of compromise? Sure (there may not be a compromise that the entities involved would find satisfactory, but that’s beside the point).
The thing is, merely understanding that other value systems exist does not mean the superintelligence cares about any value system other than its own (unless its own value system tells it to care for other entities and their preferences).
Thanks again for the detail. If I don’t misunderstand you, we do agree that:
There needs to be a subject for there to be a value system.
So for there to be positive/negative values, there needs to be some subset (a “thought pattern” perhaps) of a subject in reality that effectively “is” these values.
Now, you wrote:
I also agree with that, a (super-)human can imagine many possible value systems.
But then how does this fit with:
Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals?
To get more concrete, a human can reject their inherent or learned value system, so this is nothing new. A human can even contemplate what it means for there to be any value systems at all. For example one can ask something like this: If it is the value systems that determine what is good and bad, could one not create a value system in which there is nothing bad? Generally, can one not alter the value systems themselves?
A superintelligence that isn’t effectively “enslaved” (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it.
We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality. So there objectively are parts of reality that can represent such subjects, as well as positive and negative value, even if the “triggers” for these value patterns were completely arbitrary and opposed among the subjects.
Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively? One can define this independently of what subjective forms for these negative values actually exist or not.
No? They don’t have to exist in reality. I can imagine “the value system of Abraham Lincoln”, even though he is dead. I can imagine “the value system of the Azad Empire from Ian Banks’ Culture novels”, even though it’s fictional. I can imagine “the value system of valuing nothing but cakes”, even though no human in reality has that value system.
Sure.
Correction: The only way that matters to evaluate value systems is according to ones existing value system(s).
A hypothetical paperclip maximizer cares only about one metric: maximizing paperclips. By what metric would it reject the idea of maximizing paperclips? (yes it can imagine other metrics and value systems, but the only values that motivate it are the ones it already has. It’s literally what it means to have values).
Humans have multiple desires and values, sometimes contradictory. What you are describing seems to me something like “one part of the human value system rejecting another part”.
The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by.
You are not rejecting a value system for no reason at all. You are rejecting it according to your preferences. Which means to you do have preferences. Which means you value something, besides that one value system in question.
Now imagine an AI that has no preferences at all besides that one value system.
Humans do in fact have a bunch of drives (such as desire to learn) and preferences (such as being happy) before they even learn any value system from other humans. We shouldn’t assume that is true for AI.
Terminal values don’t need to have a point to them.
If you ask a human “why do you want to be happy?” an honest answer might be “There are a bunch of positive side effects to being happy, such as increased productivity, but ultimately I value happiness for its own sake”
It can be stated as an objective fact that “According to the value system of Joe Schmo from Petersborough, wearing makeup is bad”. And if you look into his mind, he does in fact think that, so it’s a true statement about reality.
But if you try to use that to imply something like “see, it means that wearing makeup is objectively bad”, that’s just not true. No, it’s bad according to that one value system, out of the infinite possible number of value systems that could exist.
Sorry, that’s not what I meant to communicate here, let me try that again:
There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?
This in turn means that it should in fact be possible to understand the “mechanics” of pleasure/suffering “objectively”.
So one mind should theoretically be able to comprehend the “subjective” state of another without being that other mind; although information about the other subject’s internal state will in reality be limited of course.
Or let me put it this way: What we call “subjective” is just a special kind of subset of “objective” reality.
If it were not so, then how would the subjects share a reality in which they interact under non-subjective rules? Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?
Now the implication of pleasure/suffering (and value systems) being something that can be “objectively” understood is that one can compare not against one’s own value system, but against the understanding of what value systems are.
Sure, you can tell me that this again would just be done because of what the agent’s value system tells it directly or indirectly to do, that’s fine by me.
But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.
And since it must be objectively possible to define good and bad one can reject some value system based thereon. An agent must not be limited to some arbitrary value system.
Yes I agree with that of course. But some complex subjective preferences not being objectively good/bad is not the same as the objective absence or existence of intrinsic pleasure and suffering. The triggers for pleasure and suffering are not necessarily pleasure and suffering themselves.
In case someone now wishes to object with 1. “But some people like to suffer!” or 2. “But people accept some suffering for future pleasure (or whatever)!”:
If they truly “like to suffer”, then do they actually suffer?
If they accept some suffering in trade for pleasure, does that make the state of suffering intrinsically good? Could one not “objectively” say that it would be better if no suffering were “required” compared to this scenario?
As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.
Yes.
Yes.
That’s a misleading way to phrase things.
A person’s opinions are not a “subset” of reality.
If I believe in dragons, it doesn’t mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.
I obviously agree that reality exists and is real and that we all exist in the same reality under some objective laws of physics.
What does “objective definition of good and bad” even mean? That all possible value systems that exist agree on what good and bad means? That there exist the “one true value system” which is correct and all the other ones are wrong?
And no, I don’t agree with that statement. Pleasure and suffering are physical processes. I’m not sure how you arrived at the conclusion that they are “objectively” good or bad.
What? No. I said that an agent value can alter or reject its value system based on its personal (subjective) preferences. That’s literally the opposite of what you are claiming.
Of course!
Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.
No. It means that there are “objectively” definable subject states that are good or bad, pleasure or suffering, positive or negative, or however you would like to phrase it.
Basically yes, that is what it means. Of course every real mind’s information is limited, and one can never truly verify that every part of ones knowledge is actually correct, yada yada yada.
But yes, that is what it means, because it seems to be possible to understand exactly how subjects work, how minds work, and thus how “pleasure/suffering” or “value systems” or “preference functions” or whatever-wording-you-prefer-here works.
Therefore it should also be possible to subsume this generalized understanding as the “one true value system”, the value system that considers the mechanics of subjects and “value” itself.
Consider the implications of the opposite: Let’s assume it isn’t possible to have such a “one true value system” and absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?
According to the idea that no value system can be “objectively” better than another, it absolutely cannot matter which value system is used. On what ground stands any further argument that considers this true? Might makes right? I sure hope not.
Sure, we agree on this.
And what exactly makes that value system more correct than any other value system?
Who says a value system has to consider these things? Who says a value system that considers these things is better that any other value system?
You do. These are your preferences. These are your subjective preferences, about what a “good” value system should look like.
An entity with different prefences might disagree.
“I wish for this not to be the case” is not a valid argument for something not being the case. Reality does care not what you wish for.
Yes, that is exactly the case. Absolutely none of the value systems can be objectively better than any other. Because in order to compare them, you have to introduce some subjective standard to compare them by.
In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).
Of course it matters. I use my own values to evaluate my own values. And according to my own values, my value system is better, than, say, Hitler’s value system.
It’s only a problem if you demand that your value system has to be “objectively correct”. Then you might be unhappy to realize that no such system exists.
Let’s consider a simplified example:
Value system A: Create as many suffering minds as possible.
Value system B: Create as few suffering minds as possible.
So according to you both are objectively equal, yes?
Yet the suffering is also objectively real. The suffering minds all wish not to suffer (or we can just assume that as part of the A/B scenario setup for the sake of argument, if you want to object here by arguing what it means to suffer).
Why now do you think that it is not “objective” to say that B is better than A? Can I not derive the “objective” from the set of the “subjects” (the minds) here?
Sure one can still say “But you have to care about the subjects’ suffering!” or whatever, but some agent’s action separate from the scenario is not the question, the question is can one of the two scenarios objectively be worse.
That entity might be objectively wrong.
Indeed, it can not!
If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn’t objectively matter, and might de facto makes “right”.
If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this “one true value system”.
No matter what, the idea of moral nihilism is doomed to be either pointless or negative.
It is objectively real. It is not objectively bad, or objectively good.
Exactly. You have to care about their suffering to begin with, to say that maximizing suffering is bad.
If your preference is to minimize suffering, B is better than A.
If your preference is to maximize suffering, A is better than B.
If you are indifferent to suffering, then neither is better than another one.
Yes? If you are an entity that wants to wipe everything out, and have to the power to do so, that is indeed what I expect to happen.
I wouldn’t say that might makes “right”, but reality does not care about what is “right”. A nuclear bomb does not ask “wait, am I doing the right thing here by detonating and killing millions of people?”
Ok.
I would say that “moral nihilism” is the confused idea/conclusion that “objective morality matters” and “no objective morality exists”, therefore “nothing matters”.
My perspective is: no objective morality exists, but objective morality doesn’t matter anyway, everything is fine.
I could imagine a society of humans that care for each others, not because it is objectively correct to do, but because their own values are such that they care for others (and I don’t mean in a purely self-interested way either. A person can be an altruist, because their own values are altruistic, without believing in some objective morality of altruism).
Ultimately, what facts about reality are we in disagreement about?
It seems to me that the things you hope are true are that:
There are things that are objectively good and bad
The things that are objectively good and bad are in line with your idea of good and bad. (it is not the case, for example, that infinite suffering is objectively good)
A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.
And it seems to me it’s really important to figure out if this is true, before we build that superintelligent mind. Because if we are wrong about that, it could end very badly for us.
The probably most severe disagreement between us is thinking whether there can be “objectively” bad parts within reality or not.
Let me try one more time:
A consciousness can perceive something as bad or good, “subjectively”, right?
Then this very fact that there is a consciousness that can perceive something as bad or good means that such a configuration within reality is possible.
The presence of such a bad- or good-feeling “subject” is “objectively” bad- or good. Really the entire “subjective”/”objective” wording is quite confused. A “subject” is just a part of (“objective”) reality, the distinction is nonsensical when it comes to good and bad.
An additional form of confusion on top is to equate the “trigger” for bad/good subject states with the states themselves, for the “trigger” can be something arbitrary and even contradictory among subjects (“I don’t like the color blue!” and “But I like the color blue!” can contradict each other as much as they want, because they simply aren’t suffering or pleasure themselves).
Of course it doesn’t care about anything. But reality doesn’t need to care about anything for anything to be objectively good or bad. Reality doesn’t care about any laws of physics either, yet they exist.
Not quite, I think it clearly would be better if you were right, because then nothing actually could matter negatively. Unfortunately it is obvious to me that this is not the case.
I don’t precisely think that “no matter what value system it started with” part, otherwise I wouldn’t question whether any human can be trusted with a thinkable tightly controlled (“aligned”) superintelligence. But I do think that it probably is easier to create a superintelligence that isn’t tightly controlled and yet can figure out what is objectively good and bad.
Again, do you not realize that if you are right and nothing objectively matters, that this also doesn’t matter? Yeah, “But it matters for my subjective value system!”, sure, but according to your understanding the value system is ultimately pointless.
Do you understand the distinction between “Dragons exist” and “I believe that dragons exist”?
The first one is a statement about dragons. The second one is a statement about the configuration of neurons in my mind.
Yes, both statements are objective, in some sense, but the second one is not an objective statement about dragons. It is an objective statement about my beliefs.
Then hopefully you understand the distinction between “Suffering is (objectively) bad” and “I believe/feel/percieve suffering as bad”.
The first one is an statement about suffering itself. The second one is a statement about the configuration of neurons in my mind.
Yes, the second statement is also objective. But it is not an objective statement about suffering. It is an objective statement about my beliefs, my values, and/or about how my mind works.
Your argument is something akin to “I believe that dragons exist. But my mind is part of reality, therefore my beliefs are real. Therefore dragons are real!”. Sorry, no.
My point is that reality enforces the law of physics, but it does not enforce any particular morality system.
You understand that “But it matters for my subjective value system!” is indeed what matters to me, but you don’t understand that my metric of whether something is “pointless” ot not, is also based in my subjective value system?
Yes, of course.
“X exists”: Suffering exists.
“I believe that X exists”: I believe that suffering exists.
I use “suffering” to describe a state of mind in which the mind “perceives negatively”. Do you understand?
Now:
“X causes subject S suffering.” and “Subject S is suffering.” are also two different things.
The cause can be arbitrary, the causes can even be completely different between subjects, as you know, but the presence or absence of a suffering mind is an “objective” fact. Do you get the point now?
Obviously “X causes subject S suffering.” does not mean that X is objectively bad, that isn’t what I am trying to tell you. What I am trying to tell you is that “Subject S is suffering.” is intrinsically bad.
That doesn’t mean that preventing X is the only solution! For example X could just be a treatable phobia, so perhaps the subject S can be helped to no longer suffer due to the trigger X. Or to go darker, annihilating subject S also solves the issue. Funny how that works.
It is not X that is objectively negative, but (a hard to explain) state of the subject S, the “suffering” state (which you no doubt have experienced too, so I don’t need to attempt to describe it further I hope).
Yeah of course it doesn’t enforce any morality system, I never claimed that. If it would, then I probably wouldn’t need to explain this, now would I?
Sure, you claim “nothing objectively matters, but despite assuming that I still care about my value system, because I do!”, sounds like some major cognitive dissonance. “My” value system has none of these problems, and if you are right there is zero point in changing it anyway.
I’m not disputing that.
I understand that you are trying to tell me that.
Why is it intrinsically bad?
“Subject S is suffering” = “Subject S is experiencing a state of mind that subject S perceives negatively” (according to your definition above)
Why is that intrinsically bad?
The arguments you have made so far come across to me as something like “badness exists in person’s mind, minds are real, therefore badness objectively exists”. This is like claiming “dragons exist in person’s mind, minds are real, therefore dragons objectively exist”. It’s not a valid argument.
Only if you assume I secretly care about what matters “objectively”, in which case, sure, it would be something like cognitive dissonance.
Yes!
No! It is not like that. The state of “badness” in the mind is very real after all.
Do you also think your own consciousness isn’t real? Do you think your own qualia are not real? Are your thought patterns themselves not real? Your dragon example doesn’t apply to what I am talking about.
Imagine this scenario:
You experience extreme suffering for eternity. Everyone else is dead, you can see no evidence that you can ever escape as you continue to suffer, there is no place to escape to. You can’t even commit suicide if want to. According to your value system this is all incredibly bad, subjectively.
But you say objectively it is not bad, cool.
I on the other hand say that this scenario objectively is worse than nothingness would be, because there is an infinitely suffering subject, and suffering is the very definition of “objective”/”intrinsic” bad. This definition stands above any particular subject, because it can apply to every conceivable subject, making it “objective”. Something like “What if the subject likes to suffer?” means the subject doesn’t actually suffer; when I say “suffering” I mean a state the subject doesn’t want to be in.
Now...
...the cognitive dissonance is that you simultaneously think that everything is objectively absolutely meaningless/neutral (not good or bad), yet somehow still subjectively meaningful (good or bad). That doesn’t even make sense. The only way it could sort of make sense would be if there were no emergent phenomena such as consciousness in reality, so if everyone were a p-zombie. I assume you are not a p-zombie, so you should be able to verify that consciousness is in fact the most “real” thing you can possibly observe.
And I will reiterate one important point once more, the one that you cannot deny even if you keep your belief:
The argument “There is no objective bad/good within reality! So everything is objectively equally irrelevant!” renders itself immediately impotent. It admits that it itself cannot objectively matter if it is correct. It truly is a non-starter, a completely self-defeating argument.
It is a bit like some run-of-the-mill belief in some God™ that is supposedly both totally benevolent and omnipotent (and omniscient), despite all the suffering, a paradoxical idea broken from the start.
The unfortunate truth is that there can be negative “meaning”/states within reality, not wanting to believe it doesn’t change it.
No it isn’t! It literally is not defined this way.
suffering is “the state of undergoing pain, distress, or hardship.”
Please, stop making things up.
If you want very badly for your morals to be objectively true, sure, you can make up whatever you want.
You are not going to able to convince me of it, because your arguments are flawed.
I have no desire to spend any more time on this conversation.
You know what, I think you are right that there is one major flaw I continued to make here and elsewhere!
That flaw being the usage of the very word “objective”, which I didn’t use with the probably common meaning, so I really should have questioned what each of us even understands as “objective” in the first place. My bad!
The following should be closer to what I actually meant to claim:
One can generalize subjective “pleasure” and “suffering” (or perhaps “value” if you prefer) across all realistically possible subjects (or value systems). Based thereon one can derive this “one true value system” that considers all possible value systems within it.
Our disagreement may still remain unresolved by this attempted clarification of course, if I didn’t misunderstand your position completely, but at least I can avoid this particular mistake in the future.