Do you propose using evolutionary simulations to discover other-agent-aligned agents? I doubt we have the same luxury of (simulated) time that evolution had in creating humans. It didn’t have to compete against an intelligent designer; alignment researchers do (i.e., the broader AI community).
I agree that humans are highly successful (though far from optimal) at both inclusive genetic fitness and alignment with fellow sapients. However, the challenge for us now is to parse the system that resulted from this messy evolutionary process, to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
Do you propose using evolutionary simulations to discover other-agent-aligned agents?
Nah. The wright brothers didn’t need to run evo sims to reverse engineer flight. They just observed how birds bank to turn, how that relied on wing warping, and said—cool, we can do that too! Deep learning didn’t succeed through brute force evo sims either (even though Karl Sim’s evo sims work is pretty cool, it turns out that loose reverse engineering is just enormously faster).
However, the challenge for us now is … to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
Sounds about right. Fortunately we may not need to model human values at all in order to build general altruistic agents: it probably suffices that the AI optimizes for human empowerment (our ability to fulfill any long term future goals, rather than any specific values), which is a much simpler and more robust target and thus probably more long term stable.
Do you propose using evolutionary simulations to discover other-agent-aligned agents? I doubt we have the same luxury of (simulated) time that evolution had in creating humans. It didn’t have to compete against an intelligent designer; alignment researchers do (i.e., the broader AI community).
I agree that humans are highly successful (though far from optimal) at both inclusive genetic fitness and alignment with fellow sapients. However, the challenge for us now is to parse the system that resulted from this messy evolutionary process, to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
Nah. The wright brothers didn’t need to run evo sims to reverse engineer flight. They just observed how birds bank to turn, how that relied on wing warping, and said—cool, we can do that too! Deep learning didn’t succeed through brute force evo sims either (even though Karl Sim’s evo sims work is pretty cool, it turns out that loose reverse engineering is just enormously faster).
Sounds about right. Fortunately we may not need to model human values at all in order to build general altruistic agents: it probably suffices that the AI optimizes for human empowerment (our ability to fulfill any long term future goals, rather than any specific values), which is a much simpler and more robust target and thus probably more long term stable.