I don’t understand; isn’t Holden’s point precisely that a tool AI is not properly described as an optimization process? Google Maps isn’t optimizing anything in a non-trivial sense, anymore than a shovel is.
Holden wants to build Tool-AIs that output summaries of their calculations along with suggested actions. For Google Maps, I guess this would be the distance and driving times, but how does a Tool-AI summarize more general calculations that it might do?
It could give you the expected utilities of each option, but it’s hard to see how that helps if we’re concerned that its utility function or EU calculations might be wrong. Or maybe it could give a human-readable description of the predicted consequences of each option, but the process that produces such descriptions from the raw calculations would seem to require a great deal of intelligence on its own (for example it might have to describe posthuman worlds in terms understandable to us), and it itself wouldn’t be a “safe” Tool-AI, since the summaries produced would presumably not come with further alternative summaries and meta-summaries of how the summaries were calculated.
(My question might be tangential to your own comment. I just wanted your thoughts on it, and this seems to be the best place to ask.)
Honestly, this whole tool/agent distinction seems tangential to me.
Consider two systems, S1 and S2.
S1 comprises the following elements:
a) a tool T, which when used by a person to achieve some goal G, can efficiently achieve G b) a person P, who uses T to efficiently achieve G.
S2 comprises a non-person agent A which achieves G efficiently.
I agree that A is an agent and T is not an agent, and I agree that T is a tool, and whether A is a tool seems a question not worth asking. But I don’t quite see why I should prefer S1 to S2.
Surely the important question is whether I endorse G?
Well, I certainly agree that both of those things are true.
And it might be that human-level evolved moral behavior is the best we can do… I don’t know. It would surprise me, but it might be true.
That said… given how unreliable such behavior is, if human-level evolved moral behavior even approximates the best we can do, it seems likely that I would do best to work towards neither T nor A ever achieving the level of optimizing power we’re talking about here.
I don’t understand; isn’t Holden’s point precisely that a tool AI is not properly described as an optimization process? Google Maps isn’t optimizing anything in a non-trivial sense, anymore than a shovel is.
My understanding of Holden’s argument was that powerful optimization processes can be run in either tool-mode or agent-mode.
For example, Google maps optimizes routes, but returns the result with alternatives and options for editing, in “tool mode”.
Holden wants to build Tool-AIs that output summaries of their calculations along with suggested actions. For Google Maps, I guess this would be the distance and driving times, but how does a Tool-AI summarize more general calculations that it might do?
It could give you the expected utilities of each option, but it’s hard to see how that helps if we’re concerned that its utility function or EU calculations might be wrong. Or maybe it could give a human-readable description of the predicted consequences of each option, but the process that produces such descriptions from the raw calculations would seem to require a great deal of intelligence on its own (for example it might have to describe posthuman worlds in terms understandable to us), and it itself wouldn’t be a “safe” Tool-AI, since the summaries produced would presumably not come with further alternative summaries and meta-summaries of how the summaries were calculated.
(My question might be tangential to your own comment. I just wanted your thoughts on it, and this seems to be the best place to ask.)
The point is that we don’t want it to be a black box—we want to be able to get inside its head, so to speak.
(Of course, we can’t do that with humans, and that hasn’t stopped us, but it’s still a nice goal)
Honestly, this whole tool/agent distinction seems tangential to me.
Consider two systems, S1 and S2.
S1 comprises the following elements: a) a tool T, which when used by a person to achieve some goal G, can efficiently achieve G
b) a person P, who uses T to efficiently achieve G.
S2 comprises a non-person agent A which achieves G efficiently.
I agree that A is an agent and T is not an agent, and I agree that T is a tool, and whether A is a tool seems a question not worth asking. But I don’t quite see why I should prefer S1 to S2.
Surely the important question is whether I endorse G?
A tool+human differs from a pure AI agent in two important ways:
The human (probably) already has naturally-evolved morality, sparing us the very hard problem of formalizing that.
We can arrange for (almost) everyone to have access to the tool, allowing tooled humans to counterbalance eachother.
Well, I certainly agree that both of those things are true.
And it might be that human-level evolved moral behavior is the best we can do… I don’t know. It would surprise me, but it might be true.
That said… given how unreliable such behavior is, if human-level evolved moral behavior even approximates the best we can do, it seems likely that I would do best to work towards neither T nor A ever achieving the level of optimizing power we’re talking about here.
Humanity isn’t that bad. Remember that the world we live in is pretty much the way humans made it, mostly deliberately.
But my main point was that existing humanity bypasses the very hard did-you-code-what-you-meant-to problem.
I agree with that point.