But in this Eiffel Tower example, I’m not sure what is correlating with what
The physical object Eiffel Tower is correlated with itself.
However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions.
It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.
I think the question about whether current AI systems have real goals and beliefs does indeed matter
I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has ‘real’ beliefs, and there are other threats/solutions where it does matter.
I think CGP Grey perspective puts more weight on Definition 3.
I actually do not understand the distinction between Definition 2 and Definition 3. Don’t need to resolve it here. I’ve editted post to include my uncertainty on this.
The physical object Eiffel Tower is correlated with itself.
It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.
I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has ‘real’ beliefs, and there are other threats/solutions where it does matter.
I actually do not understand the distinction between Definition 2 and Definition 3. Don’t need to resolve it here. I’ve editted post to include my uncertainty on this.