Hmm, the OP isn’t arguing for it, but I’m starting to wonder if it might (upon further study) actually be a good idea to build a heuristics-based FAI. Here are some possible answers to common objections/problems of the approach:
Heuristics-based AIs can’t safely self-modify. A heuristics-based FAI could instead try to build a “cleanly designed’ FAI as its successor, just like we can, but possibly do it better if it’s smarter.
It seems impossible to accurately capture the complexity of humane values in a heuristics-based AI. What if we just give it the value of “be altruistic (in a preference utilitarian sense) towards (some group of) humans”?
The design space of “heuristics soup” is much larger than the space of “clean designs”, which gives the “cleanly designed” FAI approach a speed advantage. (This is my guess of why someone might think “cleanly designed” FAI will win the race for AGI. Somebody correct me if there are stronger reasons.) The “fitness landscape” of heuristics-based AI may be such that it’s not too hard to hit upon a viable design. Also, the only existence proof of AGI (i.e., humans) is heuristics based, so we don’t know if a “cleanly designed” human-level-or-above AGI is even a logical possibility.
A heuristics-based AI may be very powerful but philosophically incompetent. We humans are heuristics based but at least somewhat philosophically competent. Maybe “philosophical competence” isn’t such a difficult target to hit in the space of “heuristic soup” designs?
What if we just give it the value of “be altruistic (in a preference utilitarian sense) towards (some group of) humans”?
Well, then you get the standard “the best thing to do in a preference utilitarian sense would be to reprogram everyone to only prefer things that are maximally easy to satisfy” objection, and once you start trying to avoid that, you get the full complexity of value problem again.
The standard solution to that is to be altruistic to some group of people as they existed at time T, and the standard problem with that is it doesn’t allow moral progress, and the standard solution to that is to be altruistic to some idealized or extrapolated group of people. So we just have to make the heuristics-based FAI understand the concept of CEV (or whatever the right notion of “idealized” is), which doesn’t seem impossible. What does seem impossible is to achieve high confidence that it understands the notion correctly, but if provably-Friendly AI is just too slow or unfeasible, and we’re not trying to achieve 100% safety...
I thought that too until I spent a few hours thinking about how to actually implement CEV, after which I realized that any AI capable of using that monster of an algorithm is already a superintelligence (and probably turned the Earth into computronium while it was trying to get enough CPU power to bootstrap its goal system).
Anyone who wants to try a “build moderately smart AGI to help design the really dangerous AGI” approach is probably better off just making a genie machine (i.e. an AI that just does whatever its told, and doesn’t have explicit goals independent of that). At least that way the failure modes are somewhat predictable, and you can probably get to a decent multiple of human intelligence before accidentally killing everyone.
I don’t see how you can build a human-level intelligence without making it at least somewhat consequentialist. If it doesn’t decide actions based on something like expected utility maximization, how does it decide actions?
What I was referring to is the difference between:
A) An AI that accepts an instruction from the user, thinks about how to carry out the instruction, comes up with a plan, checks that the user agrees that this is a good plan, carries it out, then goes back to an idle loop.
B) An AI that has a fully realized goal system that has some variant of ‘do what I’m told’ implemented as a top-level goal, and spends its time sitting around waiting for someone to give it a command so it can get a reward signal.
Either AI will kill you (or worse) in some unexpected way if it’s a full-blown superintelligence. But option B has all sorts of failure modes that don’t exist in option A, because of that extra complexity (and flexibility) in the goal system. I wouldn’t trust a type B system with the IQ of a monkey, because it’s too likely to find some hilariously undesirable way of getting its goal fulfilled. But a type A system could probably be a bit smarter than its user without causing any disasters, as long as it doesn’t unexpectedly go FOOOM.
Of course, there’s a sense in which you could say that the type A system doesn’t have human-level intelligence no matter how impressive its problem-solving abilities are. But if all you’re looking for is an automated problem-solving tool that’s not really an issue.
The design space of “heuristics soup” is much larger than the space of “clean designs”, which gives the “cleanly designed” FAI approach a speed advantage. (This is my guess of why someone might think “cleanly designed” FAI will win the race for AGI. Somebody correct me if there are stronger reasons.)
Whaaat? This seems like saying “the design space of vehicles is much larger than the design space of bulldozers, which gives bulldozers a speed advantage”. Bulldozers aren’t easier to develop, and they don’t move faster, just because they are a more constrained target than “vehicle”… do they? What am I missing?
The design space of “heuristics soup” is much larger than the space of “clean designs”, which gives the “cleanly designed” FAI approach a speed advantage. (This is my guess of why someone might think “cleanly designed” FAI will win the race for AGI. Somebody correct me if there are stronger reasons.)
That certainly seems like a very weak reason. The time taken by most practical optimization techniques depends very little on the size of the search space. I.e. they are much more like a binary search than a random search.
Hmm, the OP isn’t arguing for it, but I’m starting to wonder if it might (upon further study) actually be a good idea to build a heuristics-based FAI. Here are some possible answers to common objections/problems of the approach:
Heuristics-based AIs can’t safely self-modify. A heuristics-based FAI could instead try to build a “cleanly designed’ FAI as its successor, just like we can, but possibly do it better if it’s smarter.
It seems impossible to accurately capture the complexity of humane values in a heuristics-based AI. What if we just give it the value of “be altruistic (in a preference utilitarian sense) towards (some group of) humans”?
The design space of “heuristics soup” is much larger than the space of “clean designs”, which gives the “cleanly designed” FAI approach a speed advantage. (This is my guess of why someone might think “cleanly designed” FAI will win the race for AGI. Somebody correct me if there are stronger reasons.) The “fitness landscape” of heuristics-based AI may be such that it’s not too hard to hit upon a viable design. Also, the only existence proof of AGI (i.e., humans) is heuristics based, so we don’t know if a “cleanly designed” human-level-or-above AGI is even a logical possibility.
A heuristics-based AI may be very powerful but philosophically incompetent. We humans are heuristics based but at least somewhat philosophically competent. Maybe “philosophical competence” isn’t such a difficult target to hit in the space of “heuristic soup” designs?
Well, then you get the standard “the best thing to do in a preference utilitarian sense would be to reprogram everyone to only prefer things that are maximally easy to satisfy” objection, and once you start trying to avoid that, you get the full complexity of value problem again.
The standard solution to that is to be altruistic to some group of people as they existed at time T, and the standard problem with that is it doesn’t allow moral progress, and the standard solution to that is to be altruistic to some idealized or extrapolated group of people. So we just have to make the heuristics-based FAI understand the concept of CEV (or whatever the right notion of “idealized” is), which doesn’t seem impossible. What does seem impossible is to achieve high confidence that it understands the notion correctly, but if provably-Friendly AI is just too slow or unfeasible, and we’re not trying to achieve 100% safety...
I thought that too until I spent a few hours thinking about how to actually implement CEV, after which I realized that any AI capable of using that monster of an algorithm is already a superintelligence (and probably turned the Earth into computronium while it was trying to get enough CPU power to bootstrap its goal system).
Anyone who wants to try a “build moderately smart AGI to help design the really dangerous AGI” approach is probably better off just making a genie machine (i.e. an AI that just does whatever its told, and doesn’t have explicit goals independent of that). At least that way the failure modes are somewhat predictable, and you can probably get to a decent multiple of human intelligence before accidentally killing everyone.
I don’t see how you can build a human-level intelligence without making it at least somewhat consequentialist. If it doesn’t decide actions based on something like expected utility maximization, how does it decide actions?
What I was referring to is the difference between:
A) An AI that accepts an instruction from the user, thinks about how to carry out the instruction, comes up with a plan, checks that the user agrees that this is a good plan, carries it out, then goes back to an idle loop.
B) An AI that has a fully realized goal system that has some variant of ‘do what I’m told’ implemented as a top-level goal, and spends its time sitting around waiting for someone to give it a command so it can get a reward signal.
Either AI will kill you (or worse) in some unexpected way if it’s a full-blown superintelligence. But option B has all sorts of failure modes that don’t exist in option A, because of that extra complexity (and flexibility) in the goal system. I wouldn’t trust a type B system with the IQ of a monkey, because it’s too likely to find some hilariously undesirable way of getting its goal fulfilled. But a type A system could probably be a bit smarter than its user without causing any disasters, as long as it doesn’t unexpectedly go FOOOM.
Of course, there’s a sense in which you could say that the type A system doesn’t have human-level intelligence no matter how impressive its problem-solving abilities are. But if all you’re looking for is an automated problem-solving tool that’s not really an issue.
Whaaat? This seems like saying “the design space of vehicles is much larger than the design space of bulldozers, which gives bulldozers a speed advantage”. Bulldozers aren’t easier to develop, and they don’t move faster, just because they are a more constrained target than “vehicle”… do they? What am I missing?
Do you have a coherent formalism of preference utilitarianism handy? That would be great.
That certainly seems like a very weak reason. The time taken by most practical optimization techniques depends very little on the size of the search space. I.e. they are much more like a binary search than a random search.