I don’t think anyone will be able to. Here is my attempt at a more precise definition than what we have on the table:
An agent models the world and selects actions in a way that depends on what its modeling says will happen if it selects a given action.
A tool may model the world, and may select actions depending on its modeling, but may not select actions in a way that depends on what its modeling says will happen if it selects a given action.
A consequence of this definition is that some very simple AIs that can be thought of as “doing something,” such as some very simple checkers programs or a program that waters your plants if and only if its model says it didn’t rain, would count as tools rather than agents. I think that is a helpful way of carving things up.
A tool may model the world, and may select actions depending on its modeling, but may not select actions in a way that depends on what its modeling says will happen if it selects a given action.
So if the question is related to the future (such as “will it rain tomorrow?”), does it essentially mean that a tool will model a counterfactual alternative future which would happen if the tool did not provide any answer?
This would be OK for situations where the answer of the AI does not make a big difference (such as “will it rain tomorrow?”).
It would be less OK for situations where the mere knowledge about “what AI said” would influence the result, such as asking AI about important social or political topics, where the answer is likely to be published. (In these situations the question considered would be mixed with specific events of the counterfactual world, such as a worldwide panic “our superhuman AI seems to be broken, we are all doomed!”).
I don’t think anyone will be able to. Here is my attempt at a more precise definition than what we have on the table:
An agent models the world and selects actions in a way that depends on what its modeling says will happen if it selects a given action.
A tool may model the world, and may select actions depending on its modeling, but may not select actions in a way that depends on what its modeling says will happen if it selects a given action.
A consequence of this definition is that some very simple AIs that can be thought of as “doing something,” such as some very simple checkers programs or a program that waters your plants if and only if its model says it didn’t rain, would count as tools rather than agents. I think that is a helpful way of carving things up.
So if the question is related to the future (such as “will it rain tomorrow?”), does it essentially mean that a tool will model a counterfactual alternative future which would happen if the tool did not provide any answer?
This would be OK for situations where the answer of the AI does not make a big difference (such as “will it rain tomorrow?”).
It would be less OK for situations where the mere knowledge about “what AI said” would influence the result, such as asking AI about important social or political topics, where the answer is likely to be published. (In these situations the question considered would be mixed with specific events of the counterfactual world, such as a worldwide panic “our superhuman AI seems to be broken, we are all doomed!”).
I think that you’re describing a real hurdle, though it seems like a hurdle that could be overcome.