I don’t see how you can build a human-level intelligence without making it at least somewhat consequentialist. If it doesn’t decide actions based on something like expected utility maximization, how does it decide actions?
What I was referring to is the difference between:
A) An AI that accepts an instruction from the user, thinks about how to carry out the instruction, comes up with a plan, checks that the user agrees that this is a good plan, carries it out, then goes back to an idle loop.
B) An AI that has a fully realized goal system that has some variant of ‘do what I’m told’ implemented as a top-level goal, and spends its time sitting around waiting for someone to give it a command so it can get a reward signal.
Either AI will kill you (or worse) in some unexpected way if it’s a full-blown superintelligence. But option B has all sorts of failure modes that don’t exist in option A, because of that extra complexity (and flexibility) in the goal system. I wouldn’t trust a type B system with the IQ of a monkey, because it’s too likely to find some hilariously undesirable way of getting its goal fulfilled. But a type A system could probably be a bit smarter than its user without causing any disasters, as long as it doesn’t unexpectedly go FOOOM.
Of course, there’s a sense in which you could say that the type A system doesn’t have human-level intelligence no matter how impressive its problem-solving abilities are. But if all you’re looking for is an automated problem-solving tool that’s not really an issue.
I don’t see how you can build a human-level intelligence without making it at least somewhat consequentialist. If it doesn’t decide actions based on something like expected utility maximization, how does it decide actions?
What I was referring to is the difference between:
A) An AI that accepts an instruction from the user, thinks about how to carry out the instruction, comes up with a plan, checks that the user agrees that this is a good plan, carries it out, then goes back to an idle loop.
B) An AI that has a fully realized goal system that has some variant of ‘do what I’m told’ implemented as a top-level goal, and spends its time sitting around waiting for someone to give it a command so it can get a reward signal.
Either AI will kill you (or worse) in some unexpected way if it’s a full-blown superintelligence. But option B has all sorts of failure modes that don’t exist in option A, because of that extra complexity (and flexibility) in the goal system. I wouldn’t trust a type B system with the IQ of a monkey, because it’s too likely to find some hilariously undesirable way of getting its goal fulfilled. But a type A system could probably be a bit smarter than its user without causing any disasters, as long as it doesn’t unexpectedly go FOOOM.
Of course, there’s a sense in which you could say that the type A system doesn’t have human-level intelligence no matter how impressive its problem-solving abilities are. But if all you’re looking for is an automated problem-solving tool that’s not really an issue.