(I’m trying to repeat things in many different ways so as to increase the chance that I’m understood; apologies if the repetition is needless.)
Is the objection before or after this point?
Before, but again my objection is sort of orthogonal to the way you’ve set up the scenario. When you say you can write down TDT “agents” I don’t believe you. I believe you can write down specifications of syntax-manipulating algorithms that will solve tic tac toe or other narrow problems just fine, and I of course believe that it’s physically possible to call such algorithms “agents” if such a fancy appeals to you, but I don’t confidently believe that they are or could ever be real agents in the way that word is commonly interpreted. (“Intelligence, to be useful, must be used for something other than defeating itself.”) You can interpret such a syntax manipulator as an agent to the extent that you can interpret the planet Saturn as an agent, but this is qualitatively different from talking about real agentic things like humans or gods, and I’m worried about pivoting on this word “agent” as if conclusions drawn in one domain can be routinely expected to work for the other. There is some math about an abstract thing called expected utility, and we can use roughly that conceptual scheme to conveniently label certain syntax-manipulating algorithms or to roughly carve up the world as we see it, but this doesn’t mean that things like “beliefs” or “preferences” actually exist out there in the world in any reliable metaphysical sense such that we can be confident of our application of them beyond their intended purview. So when you say:
Do you think this agent definition implodes, or that the resulting agents just don’t act as self-interested as they look like they would?
I don’t know how to interpret this question in a way that I’m confident makes sense. I certainly want to know how to interpret it but would have to think about it a lot longer. Perhaps if I was more familiar with both the relevant arguments from the formal epistemology literature and the philosophy of mind literature then I would be able to confidently interpret it.
So I can write down these formal symbol-manipulating algorithms, that look to a naive onlooker like they will do things like keep to themselves and prove the Goldbach conjecture. We can talk about the question of fact: if we run such an algorithm on a Turing machine (made of math), would it in fact output a proof of the Goldbach conjecture? And then we can talk about the other question of fact, which seems to be equivalent unless you dispute some very fundamental claims: if we simulate that computation on a real computer, will it in fact output a proof of the Goldbach conjecture?
It seems like one could try and cut this sort of reasoning at three points, if you accept it so far: either it breaks down when the goals get complicated, it breaks down when the reasoning gets hard, or it breaks down when the algorithm’s embedding in the environment is too complicated.
If you accept that these algorithms systematically do things that lead to their apparent “goals” being satisfied (so that we can predict outcomes using this sort of reasoning), then I don’t know what exactly you are arguing.
(I’m trying to repeat things in many different ways so as to increase the chance that I’m understood; apologies if the repetition is needless.)
Before, but again my objection is sort of orthogonal to the way you’ve set up the scenario. When you say you can write down TDT “agents” I don’t believe you. I believe you can write down specifications of syntax-manipulating algorithms that will solve tic tac toe or other narrow problems just fine, and I of course believe that it’s physically possible to call such algorithms “agents” if such a fancy appeals to you, but I don’t confidently believe that they are or could ever be real agents in the way that word is commonly interpreted. (“Intelligence, to be useful, must be used for something other than defeating itself.”) You can interpret such a syntax manipulator as an agent to the extent that you can interpret the planet Saturn as an agent, but this is qualitatively different from talking about real agentic things like humans or gods, and I’m worried about pivoting on this word “agent” as if conclusions drawn in one domain can be routinely expected to work for the other. There is some math about an abstract thing called expected utility, and we can use roughly that conceptual scheme to conveniently label certain syntax-manipulating algorithms or to roughly carve up the world as we see it, but this doesn’t mean that things like “beliefs” or “preferences” actually exist out there in the world in any reliable metaphysical sense such that we can be confident of our application of them beyond their intended purview. So when you say:
I don’t know how to interpret this question in a way that I’m confident makes sense. I certainly want to know how to interpret it but would have to think about it a lot longer. Perhaps if I was more familiar with both the relevant arguments from the formal epistemology literature and the philosophy of mind literature then I would be able to confidently interpret it.
This does help with clarity.
So I can write down these formal symbol-manipulating algorithms, that look to a naive onlooker like they will do things like keep to themselves and prove the Goldbach conjecture. We can talk about the question of fact: if we run such an algorithm on a Turing machine (made of math), would it in fact output a proof of the Goldbach conjecture? And then we can talk about the other question of fact, which seems to be equivalent unless you dispute some very fundamental claims: if we simulate that computation on a real computer, will it in fact output a proof of the Goldbach conjecture?
It seems like one could try and cut this sort of reasoning at three points, if you accept it so far: either it breaks down when the goals get complicated, it breaks down when the reasoning gets hard, or it breaks down when the algorithm’s embedding in the environment is too complicated.
If you accept that these algorithms systematically do things that lead to their apparent “goals” being satisfied (so that we can predict outcomes using this sort of reasoning), then I don’t know what exactly you are arguing.