Let’s start with something simple, like a lobster or a dog… oh wait, what if it transcends and isn’t human-friendly.
Lobsters and dogs aren’t general intelligences. A million years of dog-thoughts can’t do the job of a few minutes of human-thoughts. Although a self-improving dog could be pretty friendly. Cats on the other hand… well that would be bad news. :)
what if our model of neural function is wrong and we create a sociopathic copy that isn’t human-friendly.
I find that very unlikely. If you look at diseases or compounds that affect every neuron in the brain, they usually affect all cognitive abilities. Keeping intelligence while eliminating empathy would be pretty hard to do by accident, and if it did happen it would be easy to detect. Humans have experience detecting sociopathic tendencies in other humans. Unlike an AI, an upload can’t easily understand its own code, so self-improving is going to be that much more difficult. It’s not going to be some super-amazing thing that can immediately hack a human mind over a text terminal.
oh wait, what if one of our partial brain models transcends and isn’t human-friendly.
That still seems unlikely. If you look at brains with certain parts missing or injured, you see that they are disabled in very specific ways. Take away just a tiny part of a brain and you’ll end up with things like face blindness, Capgras delusion, or Anton-Babinski syndrome. By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.
So they won’t transcend if we do nothing but run them in copies of their ancestral environments. But how likely is that? They will instead become tools in our software toolbox (see below).
Unlike an AI, an upload can’t easily understand its own code, so self-improving is going to be that much more difficult.
The argument for uploads first is not that by uploading humans, we have solved the problem of Friendliness. The uploads still have to solve that problem. The argument is that the odds are better if the first human-level faster-than-human intelligences are copies of humans rather than nonhuman AIs.
But guaranteeing fidelity in your copy is itself a problem comparable to the problem of Friendliness. It would be incredibly easy for us to miss that (e.g.) a particular neuronal chemical response is of cognitive and not just physiological significance, leave it out of the uploading protocol, and thereby create “copies” which systematically deviate from human cognition in some way, whether subtle or blatant.
By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.
The classic recipe for unsafe self-enhancing AI is that you assemble a collection of software tools, and use them to build better tools, and eventually you delegate even that tool-improving function. The significance of partial uploads is that they can give a big boost to this process.
Lobsters and dogs aren’t general intelligences. A million years of dog-thoughts can’t do the job of a few minutes of human-thoughts. Although a self-improving dog could be pretty friendly. Cats on the other hand… well that would be bad news. :)
I find that very unlikely. If you look at diseases or compounds that affect every neuron in the brain, they usually affect all cognitive abilities. Keeping intelligence while eliminating empathy would be pretty hard to do by accident, and if it did happen it would be easy to detect. Humans have experience detecting sociopathic tendencies in other humans. Unlike an AI, an upload can’t easily understand its own code, so self-improving is going to be that much more difficult. It’s not going to be some super-amazing thing that can immediately hack a human mind over a text terminal.
That still seems unlikely. If you look at brains with certain parts missing or injured, you see that they are disabled in very specific ways. Take away just a tiny part of a brain and you’ll end up with things like face blindness, Capgras delusion, or Anton-Babinski syndrome. By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.
So they won’t transcend if we do nothing but run them in copies of their ancestral environments. But how likely is that? They will instead become tools in our software toolbox (see below).
The argument for uploads first is not that by uploading humans, we have solved the problem of Friendliness. The uploads still have to solve that problem. The argument is that the odds are better if the first human-level faster-than-human intelligences are copies of humans rather than nonhuman AIs.
But guaranteeing fidelity in your copy is itself a problem comparable to the problem of Friendliness. It would be incredibly easy for us to miss that (e.g.) a particular neuronal chemical response is of cognitive and not just physiological significance, leave it out of the uploading protocol, and thereby create “copies” which systematically deviate from human cognition in some way, whether subtle or blatant.
The classic recipe for unsafe self-enhancing AI is that you assemble a collection of software tools, and use them to build better tools, and eventually you delegate even that tool-improving function. The significance of partial uploads is that they can give a big boost to this process.