Great summary, thanks for writing this up! A few questions / quibbles / clarifications:
System 1: The Eiffel tower itself. The Eiffel tower itself contains the information that ‘The Eiffel Tower is in Paris’ (the properties of the physical object is what determines the truth of the statement), but the Eiffel tower ‘obviously’ has no beliefs.
I’m not clear on exactly how this example is supposed to work. I think Miles was making the point that bare information is merely a measure of correlation, so even the states of simple physical objects can carry information about something else without thereby having beliefs (consider, e.g., a thermometer). But in this Eiffel Tower example, I’m not sure what is correlating with what—could you explain what you mean?
System 2: An LLM. The information ‘The Eiffel Tower is in Paris’ is encoded in a (capable enough) foundation model, but we did not agree on whether this was a belief. Max said the LLM cannot act on this information so it cannot have beliefs, whereas I think that the ability to correctly complete the sentence “The Eiffel Tower is in the city ” corresponds to the LLM having some kind of beliefs.
I’m not sure I want to say that the inability of a bare LLM to act on the information that the Eiffel Tower is in Paris means that it can’t have that as a belief. I think it might suffice for the LLM to be disposed to act on this information if it could act at all. (In the same way, someone with full body paralysis might not be able to act on many of their beliefs, but these are still perfectly good beliefs.) However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions. Better evidence would show that the LLM somehow relies on this fact in its reasoning (to the extent that you can make it do reasoning).
Regarding the CGP Grey quote, I think this could be a red herring. Speaking in a very loose sense, perhaps a smallpox virus “wants to spread”. But speaking in a very loose sense, so too does a fire (it also has that tendency). Yet the dangers posed by a fire are of a relevantly different kind than those posed by an intelligent adversary, as are the strategies that are appropriate for each. So I think the question about whether current AI systems have real goals and beliefs does indeed matter, since it tells us whether we are dealing with a hazard like a fire or a hazard like an adversarial human being. (And all this has nothing really to do with consciousness.)
Lastly, just a small quibble:
I think this perspective puts more weight on Definition 2 of beliefs (dispositionalism) than the other two definitions.
I actually think the CPG Grey perspective puts more weight on Definition 3, which is a kind of instrumentalism, than it does on dispositionalism. Daniel Dennett, who adopts something like Definition 3, argues that even thermostats have beliefs in this sense, and I suppose he might say the same about a smallpox virus. But although it might be predictively efficient to think of smallpox viruses as having beliefs and desires (Definition 3), viruses seem to lack many of the dispositions that usually characterise believers (Definition 2).
But in this Eiffel Tower example, I’m not sure what is correlating with what
The physical object Eiffel Tower is correlated with itself.
However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions.
It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.
I think the question about whether current AI systems have real goals and beliefs does indeed matter
I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has ‘real’ beliefs, and there are other threats/solutions where it does matter.
I think CGP Grey perspective puts more weight on Definition 3.
I actually do not understand the distinction between Definition 2 and Definition 3. Don’t need to resolve it here. I’ve editted post to include my uncertainty on this.
Great summary, thanks for writing this up! A few questions / quibbles / clarifications:
I’m not clear on exactly how this example is supposed to work. I think Miles was making the point that bare information is merely a measure of correlation, so even the states of simple physical objects can carry information about something else without thereby having beliefs (consider, e.g., a thermometer). But in this Eiffel Tower example, I’m not sure what is correlating with what—could you explain what you mean?
I’m not sure I want to say that the inability of a bare LLM to act on the information that the Eiffel Tower is in Paris means that it can’t have that as a belief. I think it might suffice for the LLM to be disposed to act on this information if it could act at all. (In the same way, someone with full body paralysis might not be able to act on many of their beliefs, but these are still perfectly good beliefs.) However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions. Better evidence would show that the LLM somehow relies on this fact in its reasoning (to the extent that you can make it do reasoning).
Regarding the CGP Grey quote, I think this could be a red herring. Speaking in a very loose sense, perhaps a smallpox virus “wants to spread”. But speaking in a very loose sense, so too does a fire (it also has that tendency). Yet the dangers posed by a fire are of a relevantly different kind than those posed by an intelligent adversary, as are the strategies that are appropriate for each. So I think the question about whether current AI systems have real goals and beliefs does indeed matter, since it tells us whether we are dealing with a hazard like a fire or a hazard like an adversarial human being. (And all this has nothing really to do with consciousness.)
Lastly, just a small quibble:
I actually think the CPG Grey perspective puts more weight on Definition 3, which is a kind of instrumentalism, than it does on dispositionalism. Daniel Dennett, who adopts something like Definition 3, argues that even thermostats have beliefs in this sense, and I suppose he might say the same about a smallpox virus. But although it might be predictively efficient to think of smallpox viruses as having beliefs and desires (Definition 3), viruses seem to lack many of the dispositions that usually characterise believers (Definition 2).
The physical object Eiffel Tower is correlated with itself.
It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.
I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has ‘real’ beliefs, and there are other threats/solutions where it does matter.
I actually do not understand the distinction between Definition 2 and Definition 3. Don’t need to resolve it here. I’ve editted post to include my uncertainty on this.