Shouldn’t we expect that ultimately the only thing selected for is mostly caring about long run power?
I was attempting to address that in my first footnote, though maybe it’s too important a consideration to be relegated to a footnote.
To say it differently, I think we’ll see selection evolutionary fitness, which can take two forms:
Selection on AIs’ values, for values that are more fit, given the environment.
Selection on AIs’ rationality and time preference, for long-term strategic VNM rationality.
These are “substitutes” for each other. An agent can either have adaptive values, adaptive strategic orientation, or some combination of both. But agents that fall below the Pareto frontier described by those two axes[1], will be outcompeted.
Early in the singularity, I expect to see more selection on values, and later in the singularity (and beyond), I expect to see more selection on strategic rationality, because I (non-confidently) expect the earliest systems to be myopic and incoherent in roughly similar ways to humans (though probably the distribution of AIs will vary more on those traits than humans).
The fewer generations there are before strong, VNM agents with patient values / long time preferences, the less I expect small amounts of caring for human in AI systems will be eroded.
Actually, “axes” are a bit misleading since the space of possible values is vast and high dimensional. But we can project it onto the scalar of “how fit are these values (given some other assumptions)?”
I was attempting to address that in my first footnote, though maybe it’s too important a consideration to be relegated to a footnote.
To say it differently, I think we’ll see selection evolutionary fitness, which can take two forms:
Selection on AIs’ values, for values that are more fit, given the environment.
Selection on AIs’ rationality and time preference, for long-term strategic VNM rationality.
These are “substitutes” for each other. An agent can either have adaptive values, adaptive strategic orientation, or some combination of both. But agents that fall below the Pareto frontier described by those two axes[1], will be outcompeted.
Early in the singularity, I expect to see more selection on values, and later in the singularity (and beyond), I expect to see more selection on strategic rationality, because I (non-confidently) expect the earliest systems to be myopic and incoherent in roughly similar ways to humans (though probably the distribution of AIs will vary more on those traits than humans).
The fewer generations there are before strong, VNM agents with patient values / long time preferences, the less I expect small amounts of caring for human in AI systems will be eroded.
Actually, “axes” are a bit misleading since the space of possible values is vast and high dimensional. But we can project it onto the scalar of “how fit are these values (given some other assumptions)?”