I meant that I see most humans as aligned with human values such as happiness and avoiding suffering. The point I’m trying to make is that human minds are able to represent these concepts internally and act on them in a robust way and therefore it seems possible in principle that AIs could too.
I’m not sure whether humans are aligned with evolution. Many humans do want children but I don’t think many are fitness maximizes where they want as many as possible.
Firstly, humans are unable to self modify to the degree that an AGI will be able to. It is not clear to me that a human given the chance to self modify wouldn’t immediately wirehead. An AGI may require a higher degree of alignment than what individual humans demonstrate.
Second, it is surely worth noting that humans aren’t particularly aligned to their own happiness or avoiding suffering when the consequences of their action are obscured by time and place.
In the developed world humans make dietary decisions that lead to horrific treatment of animals, despite most humans not being willing to torture and animal themselves.
It also appears quite easy for the environment to trick individual humans into making decisions that increase their suffering in the long term for apparent short term pleasure. A drug addict is the obvious example, but who among us can say they haven’t wasted hours of their lives browsing the internet etc.
I meant that I see most humans as aligned with human values such as happiness and avoiding suffering. The point I’m trying to make is that human minds are able to represent these concepts internally and act on them in a robust way and therefore it seems possible in principle that AIs could too.
I’m not sure whether humans are aligned with evolution. Many humans do want children but I don’t think many are fitness maximizes where they want as many as possible.
Two points.
Firstly, humans are unable to self modify to the degree that an AGI will be able to. It is not clear to me that a human given the chance to self modify wouldn’t immediately wirehead. An AGI may require a higher degree of alignment than what individual humans demonstrate.
Second, it is surely worth noting that humans aren’t particularly aligned to their own happiness or avoiding suffering when the consequences of their action are obscured by time and place.
In the developed world humans make dietary decisions that lead to horrific treatment of animals, despite most humans not being willing to torture and animal themselves.
It also appears quite easy for the environment to trick individual humans into making decisions that increase their suffering in the long term for apparent short term pleasure. A drug addict is the obvious example, but who among us can say they haven’t wasted hours of their lives browsing the internet etc.