I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
Did you notice, in my preamble, that I mentioned software testing?
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
So when we try to make an AI whose physical consequence is the implementation of what is right, we make that AI’s causal chain start with the state of human brains—perhaps non-destructively scanned on the neural level by nanotechnology, or perhaps merely inferred with superhuman precision from external behavior—but not passed through the noisy, blurry, destructive filter of human beings trying to guess their own morals.
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
try to make an AI whose physical consequence is the implementation of what is right
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
What would I need to make of that?
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.