ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.
Optimal? Can you say more clearly what you mean by that?
give it the goal to not alter reality in any way besides answering questions.
It is not allowed to consume power or generate heat? Can you say more clearly what you mean by that?
The whole point of what I’m trying to say is that I don’t need to elaborate on the task definition. The AI is smarter than us and understands human psychology. If we don’t define “optimal” properly it should be able to find a suitable definition on its own by imagining what we might have meant. If that turns out to be wrong, we can tell it and it comes up with an alternative.
I agree on the second point. It would be hard to define that goal properly, so it doesn’t just shut itself down, but I don’t think it would be impossible.
The idea that someone else would be able to build a superintelligence while you are teaching yours seems kind of far-fetched. I would assume that this takes a lot of effort and can only be done by huge corporations or states, anyway. If that is the case, there would be ample warning when one should finalize the AI and implement the goal before someone else becomes a threat by accidentally unleashing a paperclip maximizer.
If we don’t define “optimal” properly it should be able to find a suitable definition on its own by imagining what we might have meant.
But it wouldn’t want to. If we mistakenly define ‘optimal’ to mean ‘really good at calculating pi’ then it won’t want to change itself to aim for our real values. It would realise that we made a mistake, but wouldn’t want to rectify it, because the only thing it cares about is calculating pi, and helping humans isn’t going to do that.
You’re broadly on the right track; the idea of CEV is that we just tell the AI to look at humans and do what they would have wanted it to do. However, we have to actually be able to code that; it’s not going to converge on that by itself.
It would want to, because it’s goal is defined as “tell the truth”.
You have to differentiate between the goal we are trying to find (the optimal one) and the goal that is actually controlling what the AI does (“tell the truth”), while we are still looking for what that optimal goal could be.
the optimal goal is only implemented later, when we are sure that there are no bugs.
The AI is smarter than us and understands human psychology. If we don’t define “optimal” properly it should be able to find a suitable definition on its own by imagining what we might have meant. If that turns out to be wrong, we can tell it and it comes up with an alternative.
First you have to tell the machine to do that. It isn’t trivial. The problem is not with the definition of “optimal” itself—but with what function is being optimised.
The idea that someone else would be able to build a superintelligence while you are teaching yours seems kind of far-fetched.
Well not if you decide to train it for “a long time”. History is foll of near-simultaneous inventions being made in different places. Corporate history is full of close competition. There are anti-monopoly laws that attempt to prevent dominance by any one party—usually by screwing with any company that gets too powerful.
First you have to tell the machine to do that. It isn’t trivial. The problem is not with the definition of “optimal” itself - >but with what function is being optimised.
If the AI understands psychology, it knows what motivates us. We won’t need to explicitly explain any moral conundrums or point out dichotomies. It should be able to infer this knowledge from what it knows about the human psyche. Maybe it could just browse the internet for material on this topic to inform itself of how we humans work.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don’t want him to destroy their colony. Even the most abstract conundrums that philosophers needed centuries to even point out, much less answer, might seem obvious to the AI.
The above paragraph obviously only applies if the AI is already superhuman, but the general idea behind it works regardless of its intelligence.
Well not if you decide to train it for “a long time”. History is foll of near-simultaneous inventions being made in >different places. Corporate history is full of close competition. There are anti-monopoly laws that attempt to >prevent dominance by any one party—usually by screwing with any company that gets too powerful.
OK, this might pose a problem.
A possible solution: The AI, being supposed to turn into a benefactor for humanity as a whole, is developed in an international project instead of by a single company. This would ensure enough funding that it would be hard for a company to develop it faster, draw every AI developer to this one project, thus further eliminating competition, and reduce the chance that executive meddling causes people to get sloppy to save money.
If the AI understands psychology, it knows what motivates us. We won’t need to explicitly explain any moral conundrums or point out dichotomies. It should be able to infer this knowledge from what it knows about the human psyche. Maybe it could just browse the internet for material on this topic to inform itself of how we humans work.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don’t want him to destroy their colony. Even the most abstract conundrums that philosophers needed centuries to even point out, much less answer, might seem obvious to the AI.
So: a sufficiently intelligent agent would be able to figure out what humans wanted. We have to make it care about what we want—and also tell it how to peacefully resolve our differences when our wishes conflict.
The AI, being supposed to turn into a benefactor for humanity as a whole, is developed in an international project instead of by a single company. This would ensure enough funding that it would be hard for a company to develop it faster, draw every AI developer to this one project, thus further eliminating competition, and reduce the chance that executive meddling causes people to get sloppy to save money.
Uh huh. So: it sounds as though you have your work cut out for you.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don’t want him to destroy their colony.
But the human doesn’t have to comply. You are assuming an ant-friendly human. Many humans aren’t ant-firendly.
Optimal? Can you say more clearly what you mean by that?
It is not allowed to consume power or generate heat? Can you say more clearly what you mean by that?
...until someone else builds a superintelligence?
The whole point of what I’m trying to say is that I don’t need to elaborate on the task definition. The AI is smarter than us and understands human psychology. If we don’t define “optimal” properly it should be able to find a suitable definition on its own by imagining what we might have meant. If that turns out to be wrong, we can tell it and it comes up with an alternative.
I agree on the second point. It would be hard to define that goal properly, so it doesn’t just shut itself down, but I don’t think it would be impossible.
The idea that someone else would be able to build a superintelligence while you are teaching yours seems kind of far-fetched. I would assume that this takes a lot of effort and can only be done by huge corporations or states, anyway. If that is the case, there would be ample warning when one should finalize the AI and implement the goal before someone else becomes a threat by accidentally unleashing a paperclip maximizer.
But it wouldn’t want to. If we mistakenly define ‘optimal’ to mean ‘really good at calculating pi’ then it won’t want to change itself to aim for our real values. It would realise that we made a mistake, but wouldn’t want to rectify it, because the only thing it cares about is calculating pi, and helping humans isn’t going to do that.
You’re broadly on the right track; the idea of CEV is that we just tell the AI to look at humans and do what they would have wanted it to do. However, we have to actually be able to code that; it’s not going to converge on that by itself.
It would want to, because it’s goal is defined as “tell the truth”.
You have to differentiate between the goal we are trying to find (the optimal one) and the goal that is actually controlling what the AI does (“tell the truth”), while we are still looking for what that optimal goal could be.
the optimal goal is only implemented later, when we are sure that there are no bugs.
First you have to tell the machine to do that. It isn’t trivial. The problem is not with the definition of “optimal” itself—but with what function is being optimised.
Well not if you decide to train it for “a long time”. History is foll of near-simultaneous inventions being made in different places. Corporate history is full of close competition. There are anti-monopoly laws that attempt to prevent dominance by any one party—usually by screwing with any company that gets too powerful.
If the AI understands psychology, it knows what motivates us. We won’t need to explicitly explain any moral conundrums or point out dichotomies. It should be able to infer this knowledge from what it knows about the human psyche. Maybe it could just browse the internet for material on this topic to inform itself of how we humans work.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don’t want him to destroy their colony. Even the most abstract conundrums that philosophers needed centuries to even point out, much less answer, might seem obvious to the AI.
The above paragraph obviously only applies if the AI is already superhuman, but the general idea behind it works regardless of its intelligence.
OK, this might pose a problem. A possible solution: The AI, being supposed to turn into a benefactor for humanity as a whole, is developed in an international project instead of by a single company. This would ensure enough funding that it would be hard for a company to develop it faster, draw every AI developer to this one project, thus further eliminating competition, and reduce the chance that executive meddling causes people to get sloppy to save money.
So: a sufficiently intelligent agent would be able to figure out what humans wanted. We have to make it care about what we want—and also tell it how to peacefully resolve our differences when our wishes conflict.
Uh huh. So: it sounds as though you have your work cut out for you.
But the human doesn’t have to comply. You are assuming an ant-friendly human. Many humans aren’t ant-firendly.