SI can be a valuable organization even if Tool AI turns out to be the right approach:
Skills/organizational capabilities for safe Tool AI are similar to those for Friendly AI.
EY seems to imply that much of SI’s existing body of work can be reused.
Offhand remark that seemed important: Superintelligent Tool AI would be more difficult since it would have to be developed in way that it would not recursively self-improve.
Tool AI is nontrivial:
The number of possible plans is way too large for an AI to realistically evaluate all them. Heuristics will have to be used to find suboptimal but promising plans.
The reasoning behind the plan the AI chooses might be way beyond the comprehension of the user. It’s not clear how best to deal with this, given that the AI is only approximating the user’s wishes and can’t really be trusted to choose plans without supervision.
Constructing a halfway decent approximation of the user’s utility function and having a model good enough to make plans with are also far from solved problems.
Potential Tool AI gotcha: The AI might give you a self-fulfilling negative prophecy that the AI didn’t realize would harm you.
These are just examples. Point is, saying “but the AI will just do this!” is far removed from specifying the AI in a rigorous formal way and proving it will do that.
Tool AI is not obviously the way AGI should or will be developed:
Many leading AGI thinkers have their own pet idea about what AGI should do. Few to none endorse Tool AI. If it was obvious all the leading AGI thinkers would endorse it.
Actually, most modern AI applications don’t involve human input, so it’s not obvious that AGI will develop along Tool AI lines.
Full-time Friendliness researchers are worth having:
If nothing else, they’re useful for evaluating proposals like Holden’s Tool AI one to figure out if they are really sound.
Friendliness philosophy would be difficult to program an AI to do. Even if we thought we had a program that could do it, how would we know the answers from that program were correct? So we probably need humans.
Friendliness researchers need to have a broader domain of expertise than Holden gives them credit for. They need to have expertise in whatever happens to be necessary to ensure safe AI.
The problems of Friendliness are tricky, so laypeople should beware of jumping to conclusions about Friendliness.
Holden’s estimate of a 90% chance of doom even given a 100 person FAI team approving the design is overly pessimistic:
EY is aware it’s extremely difficult to know what properties about a prospective FAI need to be formally proved, and plans to put a lot of effort into figuring this out.
The difficulty of Friendliness is finite. The difficulties are big and subtle, but not unending.
Where did 90% come from? Lots of uncertainty here...
You can’t get a 20-move solution out of a human brain, using the native human planning algorithm. Humanity can do it, but only by exploiting the ability of humans to explicitly comprehend the deep structure of the domain (not just rely on intuition) and then inventing an artifact, a new design, running code which uses a different and superior cognitive algorithm, to solve that Rubik’s Cube in 20 moves. We do all that without being self-modifying, but it’s still a capability to respect.
A system that undertakes extended processes of research and thinking, generating new ideas and writing new programs for internal experiments, seems both much more effective and much more potentially risky than something like chess program with a simple fixed algorithm to search using a fixed narrow representation of the world (as a chess board).
The difficulty of Friendliness is finite. The difficulties are big and subtle, but not unending.
How do we know that the problem is finite? When it comes to proving a computer program safe from being hacked the problem is considered NP-hard.
Google Chrome got recently hacked by chaining 14 different bugs together.
A working AGI is probably as least a complex as Google Chrome. Proving it safe will likely also be NP-hard.
Actually, most modern AI applications don’t involve human input, so it’s not obvious that AGI will develop along Tool AI lines.
I’m not really sure what’s meant by this.
For example, in computer vision, you can input an image and get a classification as output. The input is supplied by a human. The computation doesn’t involve the human. The output is well defined. The same could be true of a tool AI that makes predictions.
Many leading AGI thinkers have their own pet idea about what AGI should do. Few to none endorse Tool AI. If it was obvious all the leading AGI thinkers would endorse it.
Both Andrew Ng and Jeff Hawkins think that tool AI is the most likely approach.
My summary (now with endorsement by Eliezer!):
SI can be a valuable organization even if Tool AI turns out to be the right approach:
Skills/organizational capabilities for safe Tool AI are similar to those for Friendly AI.
EY seems to imply that much of SI’s existing body of work can be reused.
Offhand remark that seemed important: Superintelligent Tool AI would be more difficult since it would have to be developed in way that it would not recursively self-improve.
Tool AI is nontrivial:
The number of possible plans is way too large for an AI to realistically evaluate all them. Heuristics will have to be used to find suboptimal but promising plans.
The reasoning behind the plan the AI chooses might be way beyond the comprehension of the user. It’s not clear how best to deal with this, given that the AI is only approximating the user’s wishes and can’t really be trusted to choose plans without supervision.
Constructing a halfway decent approximation of the user’s utility function and having a model good enough to make plans with are also far from solved problems.
Potential Tool AI gotcha: The AI might give you a self-fulfilling negative prophecy that the AI didn’t realize would harm you.
These are just examples. Point is, saying “but the AI will just do this!” is far removed from specifying the AI in a rigorous formal way and proving it will do that.
Tool AI is not obviously the way AGI should or will be developed:
Many leading AGI thinkers have their own pet idea about what AGI should do. Few to none endorse Tool AI. If it was obvious all the leading AGI thinkers would endorse it.
Actually, most modern AI applications don’t involve human input, so it’s not obvious that AGI will develop along Tool AI lines.
Full-time Friendliness researchers are worth having:
If nothing else, they’re useful for evaluating proposals like Holden’s Tool AI one to figure out if they are really sound.
Friendliness philosophy would be difficult to program an AI to do. Even if we thought we had a program that could do it, how would we know the answers from that program were correct? So we probably need humans.
Friendliness researchers need to have a broader domain of expertise than Holden gives them credit for. They need to have expertise in whatever happens to be necessary to ensure safe AI.
The problems of Friendliness are tricky, so laypeople should beware of jumping to conclusions about Friendliness.
Holden’s estimate of a 90% chance of doom even given a 100 person FAI team approving the design is overly pessimistic:
EY is aware it’s extremely difficult to know what properties about a prospective FAI need to be formally proved, and plans to put a lot of effort into figuring this out.
The difficulty of Friendliness is finite. The difficulties are big and subtle, but not unending.
Where did 90% come from? Lots of uncertainty here...
Holden made other good points not addressed here.
This point seems missing:
A system that undertakes extended processes of research and thinking, generating new ideas and writing new programs for internal experiments, seems both much more effective and much more potentially risky than something like chess program with a simple fixed algorithm to search using a fixed narrow representation of the world (as a chess board).
Looks pretty good, actually. Nice.
So you wrote 10x too much then?
How do we know that the problem is finite? When it comes to proving a computer program safe from being hacked the problem is considered NP-hard. Google Chrome got recently hacked by chaining 14 different bugs together. A working AGI is probably as least a complex as Google Chrome. Proving it safe will likely also be NP-hard.
Google Chrome doesn’t even self modify.
I’m not really sure what’s meant by this.
For example, in computer vision, you can input an image and get a classification as output. The input is supplied by a human. The computation doesn’t involve the human. The output is well defined. The same could be true of a tool AI that makes predictions.
Both Andrew Ng and Jeff Hawkins think that tool AI is the most likely approach.
I would consider 3 to be a few.
That is about how I read it.