I agree we are an existence proof for general intelligence. For alignment, what is the less intelligent thing whose goals humanity has remained robustly aligned to?
I meant that I see most humans as aligned with human values such as happiness and avoiding suffering. The point I’m trying to make is that human minds are able to represent these concepts internally and act on them in a robust way and therefore it seems possible in principle that AIs could too.
I’m not sure whether humans are aligned with evolution. Many humans do want children but I don’t think many are fitness maximizes where they want as many as possible.
Firstly, humans are unable to self modify to the degree that an AGI will be able to. It is not clear to me that a human given the chance to self modify wouldn’t immediately wirehead. An AGI may require a higher degree of alignment than what individual humans demonstrate.
Second, it is surely worth noting that humans aren’t particularly aligned to their own happiness or avoiding suffering when the consequences of their action are obscured by time and place.
In the developed world humans make dietary decisions that lead to horrific treatment of animals, despite most humans not being willing to torture and animal themselves.
It also appears quite easy for the environment to trick individual humans into making decisions that increase their suffering in the long term for apparent short term pleasure. A drug addict is the obvious example, but who among us can say they haven’t wasted hours of their lives browsing the internet etc.
To what extent are humans by themselves evidence of GI alignment, though? A human can acquire values that disagree with those of the humans that taught them those values just by having new experiences/knowledge, to the point of even desiring completely opposite things to their peers (like human progress VS human extinction), doesn’t that mean that humans are not robustly aligned?
I think AI alignment is solvable for the same reason AGI is solvable: humans are an existence-proof for both alignment and general intelligence.
I agree we are an existence proof for general intelligence. For alignment, what is the less intelligent thing whose goals humanity has remained robustly aligned to?
I meant that I see most humans as aligned with human values such as happiness and avoiding suffering. The point I’m trying to make is that human minds are able to represent these concepts internally and act on them in a robust way and therefore it seems possible in principle that AIs could too.
I’m not sure whether humans are aligned with evolution. Many humans do want children but I don’t think many are fitness maximizes where they want as many as possible.
Two points.
Firstly, humans are unable to self modify to the degree that an AGI will be able to. It is not clear to me that a human given the chance to self modify wouldn’t immediately wirehead. An AGI may require a higher degree of alignment than what individual humans demonstrate.
Second, it is surely worth noting that humans aren’t particularly aligned to their own happiness or avoiding suffering when the consequences of their action are obscured by time and place.
In the developed world humans make dietary decisions that lead to horrific treatment of animals, despite most humans not being willing to torture and animal themselves.
It also appears quite easy for the environment to trick individual humans into making decisions that increase their suffering in the long term for apparent short term pleasure. A drug addict is the obvious example, but who among us can say they haven’t wasted hours of their lives browsing the internet etc.
To what extent are humans by themselves evidence of GI alignment, though? A human can acquire values that disagree with those of the humans that taught them those values just by having new experiences/knowledge, to the point of even desiring completely opposite things to their peers (like human progress VS human extinction), doesn’t that mean that humans are not robustly aligned?