/Edit 1: I want to preface this by saying I am just a noob who has never posted on Less Wrong before.
/Edit 2:
I feel I should clarify my main questions (which are controversial): Is there a reason why turning all of reality into maximized conscious happiness is not objectively the best outcome for all of reality, regardless of human survival and human values? Should this in any way affect our strategy to align the first agi, and why?
/Original comment:
If we zoom out and look at the biggest picture philosophically possible, then, isn´t the only thing that ultimately matters in the end 2 things—the level of consciousness and the overall “happiness” of said consciousness(es) throughout all of time and space (counting all realities that have, are and will exist)?
To clarify; isn´t the best possible outcome for all of reality one where every particle is utilized to experience a maximally conscious and maximally “happy” state for eternity? (I put happiness in quotes because how do you measure the “goodness” of a state, or consciousness itself for that matter.)
After many years of reading countless alignment discussions (of which I have understood maybe 20 %) I have never seen this being mentioned. So I wonder; if we are dealing with a super optimizer shouldn´t we be focusing on the super big picture?
I realize this might seem controversial but I see no rational reason for why it wouldn´t be true. Although my knowledge of rationality is very limited.
What would it mean for an outcome to be objectively best for all of reality?
It might be your subjective opinion that maximized conscious happiness would be the objectively best reality. Another human’s subjective opinion might be that a reality that maximized the fulfillment of fundamentalist Christian values was the objectively best reality. A third human might hold that there’s no such thing as the objectively best, and all we have are subjective opinions.
Given that different people disagree, one could argue that we shouldn’t privilege any single person’s opinion, but try to take everyone’s opinions into account—that is, build an AI that cared about the fulfillment of something like “human values”.
Of course, that would be just their subjective opinion. But it’s the kind of subjective opinion that the people involved in AI alignment discussions tend to have.
The fact that the statement is controversial is, I think, the reason. What makes a world-state or possible future valuable is a matter of human judgment, and not every human believes this.
EY’s short story Three Worlds Collide explores what can happen when beings with different conceptions of what is valuable, have to interact. Even when they understand each other’s reasoning, it doesn’t change what they themselves value. Might be a useful read, and hopefully a fun one.
I’ll ask the same follow-up question to similar answers: Suppose everyone agreed that the proposed outcome above is what we wanted. Would this scenario then be difficult to achieve?
Could you have a machine hooked up to a person‘s nervous system, change the settings slightly to change consciousness, and let the person choose whether the changes are good or bad? Run this many times.
I don’t think this works. One, it only measure short term impacts, but any such change might have lots of medium and long term effects, second and third order effects, and effects on other people with whom I interact. Two, it measures based on the values of already-changed me, not current me, and it is not obvious that current-me cares what changed-me will think, or why I should so care if I don’t currently. Three, I have limited understanding of my own wants, needs, and goals, and so would not trust any human’s judgement of such changes far enough to extrapolate to situations they didn’t experience, let alone to other people, or the far future, or unusual/extreme circumstances.
/Edit 1: I want to preface this by saying I am just a noob who has never posted on Less Wrong before.
/Edit 2:
I feel I should clarify my main questions (which are controversial): Is there a reason why turning all of reality into maximized conscious happiness is not objectively the best outcome for all of reality, regardless of human survival and human values?
Should this in any way affect our strategy to align the first agi, and why?
/Original comment:
If we zoom out and look at the biggest picture philosophically possible, then, isn´t the only thing that ultimately matters in the end 2 things—the level of consciousness and the overall “happiness” of said consciousness(es) throughout all of time and space (counting all realities that have, are and will exist)?
To clarify; isn´t the best possible outcome for all of reality one where every particle is utilized to experience a maximally conscious and maximally “happy” state for eternity? (I put happiness in quotes because how do you measure the “goodness” of a state, or consciousness itself for that matter.)
After many years of reading countless alignment discussions (of which I have understood maybe 20 %) I have never seen this being mentioned. So I wonder; if we are dealing with a super optimizer shouldn´t we be focusing on the super big picture?
I realize this might seem controversial but I see no rational reason for why it wouldn´t be true. Although my knowledge of rationality is very limited.
What would it mean for an outcome to be objectively best for all of reality?
It might be your subjective opinion that maximized conscious happiness would be the objectively best reality. Another human’s subjective opinion might be that a reality that maximized the fulfillment of fundamentalist Christian values was the objectively best reality. A third human might hold that there’s no such thing as the objectively best, and all we have are subjective opinions.
Given that different people disagree, one could argue that we shouldn’t privilege any single person’s opinion, but try to take everyone’s opinions into account—that is, build an AI that cared about the fulfillment of something like “human values”.
Of course, that would be just their subjective opinion. But it’s the kind of subjective opinion that the people involved in AI alignment discussions tend to have.
Suppose everyone agreed that the proposed outcome is what we wanted. Would this scenario then be difficult to achieve?
The fact that the statement is controversial is, I think, the reason. What makes a world-state or possible future valuable is a matter of human judgment, and not every human believes this.
EY’s short story Three Worlds Collide explores what can happen when beings with different conceptions of what is valuable, have to interact. Even when they understand each other’s reasoning, it doesn’t change what they themselves value. Might be a useful read, and hopefully a fun one.
I’ll ask the same follow-up question to similar answers: Suppose everyone agreed that the proposed outcome above is what we wanted. Would this scenario then be difficult to achieve?
I mean, yes, because the proposal is about optimizing our entire future light for an outcome we don’t know how to formally specify.
Could you have a machine hooked up to a person‘s nervous system, change the settings slightly to change consciousness, and let the person choose whether the changes are good or bad? Run this many times.
I don’t think this works. One, it only measure short term impacts, but any such change might have lots of medium and long term effects, second and third order effects, and effects on other people with whom I interact. Two, it measures based on the values of already-changed me, not current me, and it is not obvious that current-me cares what changed-me will think, or why I should so care if I don’t currently. Three, I have limited understanding of my own wants, needs, and goals, and so would not trust any human’s judgement of such changes far enough to extrapolate to situations they didn’t experience, let alone to other people, or the far future, or unusual/extreme circumstances.
For a more involved discussion than Kaj’s answer, you might check out the “Mere Goodness” section of Rationality: A-Z.