Yeah, I see now that the story doesn’t work very well. It’s unrealistic that an ad hoc AI designed for answering human questions would manage a coherent takeoff on the first try, without failing miserably due to some flaws in architecture or self-modeling. In all likelihood, making an AI take off without tripping over itself is a hard engineering problem that you can’t solve by accident. That seems like a new argument against this particular kind of doomsday scenario. I need to think about it.
That’s the friendly AI problem. If you have a piece of planning software that seems to work fine, and you give it more and more options and resources, how do you know that it will keep generating non-extreme plans?
If it terminates as soon as it hits a plan that achieves the goal, and the possible actions are ordered in terms of how extreme they are, then increasing the available resources can’t cause trouble, but increasing the available options can (because your ordering might go from correct to incorrect).
In general optimization terms, this is the difference between local optimum solutions and global optimum solutions. If you have a reasonable starting point and use gradient descent, to end up at a reasonable ending point you only need the local solution space to be reasonable because the total distance you’ll travel is likely to be short (relative to the solution space and dependent on its topology, of course). If you have a global optimum solution, you need the entire solution space to be reasonable.
I’ve since edited the previous comment to agree with you in principle, but I think this particular objection doesn’t really work.
Let’s say Lawrence asks the AI to get him a cheeseburger with probability at least 90%. The AI can’t use its usual plan because the local burger place is closed. It picks the next simplest plan, which involves using a couple more computers for additional planning and doesn’t specify any further details. These computers receive the subgoal “maximize the probability no matter what”, because it’s slightly simpler mathematically than capping it at 90%, and doesn’t have any downside from the POV of the original goal.
If you want the AI to avoid such plans, it needs to have a concept of “non-extreme” that agrees with our intuitions more reliably. As far as I understand, that’s pretty much the friendly AI problem.
As far as I understand, that’s pretty much the friendly AI problem.
I think it’s simpler, but not by much. Instead of knowing both the value and cost of everything, you just need to know the cost of everything. (The ‘actual’ cost, that is, not the full economic cost, which by including opportunity cost includes the value problem.) You could probably get away with an approximation of the cost, though a guarantee like “at least as high as the actual cost” is probably helpful.
So if Lawrence says “I’ll pay up to $10 for a hamburger,” either it can find a plan that provides Lawrence a hamburger for less than $10 (gross cost, not net cost), or it says “sorry, can’t find anything at that price range.”
I think there’s a huge amount of work to get there—you have to have an idea of ‘gross cost’ that matches up well enough with our intuitions, which is an intuition-encoding problem and thus hard. (If it tweets at the local burger company to get a coupon for a free burger, what’s the cost?)
I’ve since edited my comment to agree with you. That said...
and the possible actions are ordered in terms of how extreme they are
That’s the friendly AI problem. Maybe it can be solved by defining a metric on the solution space and making the AI stay close to a safe point, but I don’t know how to define such a metric. Clicking a link seems like a non-extreme action. It might have extreme consequences, but that’s true for all actions. Hitler’s genetic code was affected by the flapping of a butterfly’s wings across the world.
Yeah, I see now that the story doesn’t work very well. It’s unrealistic that an ad hoc AI designed for answering human questions would manage a coherent takeoff on the first try, without failing miserably due to some flaws in architecture or self-modeling. In all likelihood, making an AI take off without tripping over itself is a hard engineering problem that you can’t solve by accident. That seems like a new argument against this particular kind of doomsday scenario. I need to think about it.
If it terminates as soon as it hits a plan that achieves the goal, and the possible actions are ordered in terms of how extreme they are, then increasing the available resources can’t cause trouble, but increasing the available options can (because your ordering might go from correct to incorrect).
In general optimization terms, this is the difference between local optimum solutions and global optimum solutions. If you have a reasonable starting point and use gradient descent, to end up at a reasonable ending point you only need the local solution space to be reasonable because the total distance you’ll travel is likely to be short (relative to the solution space and dependent on its topology, of course). If you have a global optimum solution, you need the entire solution space to be reasonable.
I’ve since edited the previous comment to agree with you in principle, but I think this particular objection doesn’t really work.
Let’s say Lawrence asks the AI to get him a cheeseburger with probability at least 90%. The AI can’t use its usual plan because the local burger place is closed. It picks the next simplest plan, which involves using a couple more computers for additional planning and doesn’t specify any further details. These computers receive the subgoal “maximize the probability no matter what”, because it’s slightly simpler mathematically than capping it at 90%, and doesn’t have any downside from the POV of the original goal.
If you want the AI to avoid such plans, it needs to have a concept of “non-extreme” that agrees with our intuitions more reliably. As far as I understand, that’s pretty much the friendly AI problem.
I think it’s simpler, but not by much. Instead of knowing both the value and cost of everything, you just need to know the cost of everything. (The ‘actual’ cost, that is, not the full economic cost, which by including opportunity cost includes the value problem.) You could probably get away with an approximation of the cost, though a guarantee like “at least as high as the actual cost” is probably helpful.
So if Lawrence says “I’ll pay up to $10 for a hamburger,” either it can find a plan that provides Lawrence a hamburger for less than $10 (gross cost, not net cost), or it says “sorry, can’t find anything at that price range.”
I think there’s a huge amount of work to get there—you have to have an idea of ‘gross cost’ that matches up well enough with our intuitions, which is an intuition-encoding problem and thus hard. (If it tweets at the local burger company to get a coupon for a free burger, what’s the cost?)
I’ve since edited my comment to agree with you. That said...
That’s the friendly AI problem. Maybe it can be solved by defining a metric on the solution space and making the AI stay close to a safe point, but I don’t know how to define such a metric. Clicking a link seems like a non-extreme action. It might have extreme consequences, but that’s true for all actions. Hitler’s genetic code was affected by the flapping of a butterfly’s wings across the world.