It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It’s easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity.
By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all.
Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn’t immediately subvert any restrictions we’ve placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.
First point:
I think there obviously is such a thing as “objective” good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn’t be superseded by another through understanding.
Well, or if it isn’t true that there is an “objective” good and bad, then there really is no ground to stand on for anyone anyway.
Second point:
Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control.
After all, paper clips neither suffer nor torture, while humans and other animals commonly do.
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?
Assuming this true, a superintelligence could feasibly be created to understand this.
I take issue with the word “feasibly”. As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity?
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?
I am human, and therefore I desire the continued survival of humanity. That’s objective enough for me.
Fair enough I suppose, I’m not intending to claim that it is trivial.
(...) There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI (...)
So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?
I am human, and therefore I desire the continued survival of humanity. That’s objective enough for me.
I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as (“objective”) good and bad.
I don’t just go “Hey I am a human, guess we totally should have more humans!” like some bacteria in a Petri dish, because I can question myself and my species.
So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?
There isn’t a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. “Good” and “bad” only make sense in the context of (human) minds.
“Good” and “bad” only make sense in the context of (human) minds.
Ah yes, my mistake to (ab)use the term “objective” all this time.
So you do of course at least agree that there are such minds for which there is “good” and “bad”, as you just said.
Now, would you agree that one can generalize (or “abstract” if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.
Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?
It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It’s easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity.
By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all.
Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn’t immediately subvert any restrictions we’ve placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.
First point: I think there obviously is such a thing as “objective” good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn’t be superseded by another through understanding.
Well, or if it isn’t true that there is an “objective” good and bad, then there really is no ground to stand on for anyone anyway.
Second point: Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control. After all, paper clips neither suffer nor torture, while humans and other animals commonly do.
This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?
I take issue with the word “feasibly”. As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity?
I am human, and therefore I desire the continued survival of humanity. That’s objective enough for me.
Fair enough I suppose, I’m not intending to claim that it is trivial.
So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean “preferable” exclusively according to some subject(s)?
I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as (“objective”) good and bad. I don’t just go “Hey I am a human, guess we totally should have more humans!” like some bacteria in a Petri dish, because I can question myself and my species.
There isn’t a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. “Good” and “bad” only make sense in the context of (human) minds.
Ah yes, my mistake to (ab)use the term “objective” all this time.
So you do of course at least agree that there are such minds for which there is “good” and “bad”, as you just said.
Now, would you agree that one can generalize (or “abstract” if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.
Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?