I think that humans are sorta “unaligned”, in the sense of being vulnerable to Goodhart’s Law.
A lot of moral philosophy is something like:
Gather our odd grab bag of heterogeneous, inconsistent moral intuitions
Try to find a coherent “theory” that encapsulates and generalizes these moral intuitions
Work through the consequences of the theory and modify it until you are willing to bite all the implied bullets.
The resulting ethical system often ends up having some super bizarre implications and usually requires specifying “free variables” that are (arguably) independent of our original moral intuitions.
In fact, I imagine that optimizing the universe according to my moral framework looks quite Goodhartian to many people.
Some examples of implications of my current moral framework:
I think that (a) personhood is preserved when moved to simulation (b) it’s easier to control what’s happening in a simulation, and consequently easier to fulfill a person’s preferences. Therefore, it’d be ideal to upload as many people as possible. In fact, I’m not sure whether or not this should even be optional, given how horrendously inefficient the ratio of organic human atoms to “utilons” is.
I value future lives, so I think we have an ethical responsibility to create as many happy beings as we can, even at some cost to current beings.
I thinkthat some beings are fundamentally capable of being happier than other beings. So, all else equal, we should prefer to create happier people. I think that parents should be forced to adhere to this when having kids.
I think that we should modify all animals so we can guarantee that they have zero consciousness, or otherwise guarantee that they don’t suffer (how do we deal with lions’ natural tendency to brutally kill gazelles?)
I think that people ought to do some limited amount of wire-heading (broadly increasing happiness independent of reality).
Complete self-determination/subjective “free-will” is both impossible and not desirable. SAI will be able to subtly, but meaningfully, guide humans down chosen paths because it can robustly predict the differential impact of seemingly minor conversational and environmental variations.
I’m sure there are many other examples.
I don’t think that my conclusions are wrong per se, but… my ethical system has some alien and potentially degenerate implications when optimized hard.
It’s also worth noting that although I stated those examples confidently (for rhetorical purposes), my stances on many of them depend on very specific details of my philosophy and have toggled back and forth many times.
No real call to action here, just some observations. Existing human ethical systems might look as exotic to the average person as some conclusions drawn by a kinda-aligned SAI.
I think that humans are sorta “unaligned”, in the sense of being vulnerable to Goodhart’s Law.
A lot of moral philosophy is something like:
Gather our odd grab bag of heterogeneous, inconsistent moral intuitions
Try to find a coherent “theory” that encapsulates and generalizes these moral intuitions
Work through the consequences of the theory and modify it until you are willing to bite all the implied bullets.
The resulting ethical system often ends up having some super bizarre implications and usually requires specifying “free variables” that are (arguably) independent of our original moral intuitions.
In fact, I imagine that optimizing the universe according to my moral framework looks quite Goodhartian to many people.
Some examples of implications of my current moral framework:
I think that (a) personhood is preserved when moved to simulation (b) it’s easier to control what’s happening in a simulation, and consequently easier to fulfill a person’s preferences. Therefore, it’d be ideal to upload as many people as possible. In fact, I’m not sure whether or not this should even be optional, given how horrendously inefficient the ratio of organic human atoms to “utilons” is.
I value future lives, so I think we have an ethical responsibility to create as many happy beings as we can, even at some cost to current beings.
I think that some beings are fundamentally capable of being happier than other beings. So, all else equal, we should prefer to create happier people. I think that parents should be forced to adhere to this when having kids.
I think that we should modify all animals so we can guarantee that they have zero consciousness, or otherwise guarantee that they don’t suffer (how do we deal with lions’ natural tendency to brutally kill gazelles?)
I think that people ought to do some limited amount of wire-heading (broadly increasing happiness independent of reality).
Complete self-determination/subjective “free-will” is both impossible and not desirable. SAI will be able to subtly, but meaningfully, guide humans down chosen paths because it can robustly predict the differential impact of seemingly minor conversational and environmental variations.
I’m sure there are many other examples.
I don’t think that my conclusions are wrong per se, but… my ethical system has some alien and potentially degenerate implications when optimized hard.
It’s also worth noting that although I stated those examples confidently (for rhetorical purposes), my stances on many of them depend on very specific details of my philosophy and have toggled back and forth many times.
No real call to action here, just some observations. Existing human ethical systems might look as exotic to the average person as some conclusions drawn by a kinda-aligned SAI.