We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
Yes.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all?
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all!
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Yes.
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.