You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
Yes.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all?
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all!
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
One simple point is that there is no reason to expect AGIs to stop at exactly human level.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Yes.
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
Thanks for the value risk link—that discussion is what I’m interested in.
I guess I’ll reply to it there. The initial quotes from Ben G. and Hanson are similar to my current view.