I see a circularity problem in how folk talk about “agents”. I doubt I’m the first to notice this problem. So I wonder what the standard reductionist materialist answer is.
The puzzle is in the title.
To add a few more words:
I have no problem with things like “rock” even though there are no rocks in the absolute universe. We’re using many layers of abstraction. We can in principle break down (even literally!) what a rock is and go as far to the root of quantum math whatever as we like.
(It’s unclear to me what the floor is that we reduce things to in reductionist materialism. But I’m okay handwaving that for now by saying “something something math-and-physics something something”.)
Something like “The Odyssey” is trickier, but not too much so. It just requires that we add in some stuff about how brains work. Nothing too mysterious there. Unknown, sure, but not fundamentally mysterious.
But this seems to get very weird once you start talking about agents.
At first blush it looks the same as “The Odyssey”. An agent is just an abstraction, right? Implemented by some unknown but fundamentally non-mysterious process in a brain.
But this is circular. An abstraction for whom? What even is an abstraction, when you’re in the process of defining an agent? Is there some agent-free definition of an abstraction implicitly being invoked here?
I keep seeing people bump into this circularity when talking about AGI as an agent, and what alignment is. It’s like folks’ native intuitions assume that both quarks and agents are ontologically fundamental, but when pushed on this point they insist that really only the quarks are real… without anything that looks even vaguely to me like a justification for how you’d construct an agent out of quarks, or what that would mean.
In most spaces I’d assume this is because people just didn’t finish thinking this through.
I’m guessing that someone somewhere has noticed this and has given this careful thought. And maybe it’s even part of the core LW philosophy and I somehow missed it.
So I’m asking:
What is an agent in reductionist materialism?
My impression is that a ton of work at MIRI (and some related research lines in other places) went into answering this question, and indeed, no one knows the answer very crisply right now and yup that’s alarming.
See John Wentworth’s post on Why Agent Foundations? An Overly Abstract Explanation, which discusses the need to find the True Name of agents.
(Also, while I agree agents are “more mysterious than rocks or The Odyssey”, I’m actually confused why the circularity is particularly the problem here. Why doesn’t the Odyssey also run into the Abstraction for Whom problem?)
Oh, I think it does, actually. It’s just less immediate or central. Like, it’s easy for me to imagine putting a copy of The Odyssey on a computer. It’s damn near impossible for me to describe what putting “an agent” on my computer is, as opposed to some other kind of program. I was just trying to point at the center of the problem is all, and set aside the usual layers-of-abstraction “explanation” I’m used to hearing for this.
For low-bar definitions of “agent”, you’re probably running some already.
“In computer science, a software agent is a computer program that acts for a user or other program in a relationship of agency, which derives from the Latin agere (to do): an agreement to act on one’s behalf. Such “action on behalf of” implies the authority to decide which, if any, action is appropriate. Agents are colloquially known as bots, from robot.”
So, if you find agents weird and mysterious, you are probably using a different definition from Wikipedia’s article on software agents.
I usually think of this in terms of Dennett’s concept of the intentional stance, according to which there is no fact of the matter of whether something is an agent or not. But there is a fact of the matter of whether we can usefully predict its behavior by modeling it as if it was an agent with some set of beliefs and goals.
For example, even though the calculations of a chess-playing computer have practically nothing in common with human thought, its moves can still be effectively predicted by assuming that it “wants” to win at chess and “knows” the rules of chess. This gives rise to the prediction that it will always choose, from the list of viable moves, one which best furthers the goal of winning the game. Even though the best move may not be obvious, adopting the intentional stance still allows the human observer to improve on their predictions of what the computer would do, by eliminating obvious bad moves.
I find it difficult to believe that there can be no objective criteria for recognising agency when there are objective criteria for building agents.
If you are willing to countenance counterfactuals, it’s possible to get more rigourous about “seems like an agent”. A system is goal-driven if it would have displayed different circumstances to achieve the same goal, IE. It avoids obstacles. A system has a utility function if there is part of the system you can change to achieve different goals, in the preceding sense.
That sounds awfully lot like asserting agency to be a mind-projecting fallacy.
That seems maybe true. What’s the problem you see with that?
Before I got to the point in my education where I learned what the CPU has eaten it seemed that software programming languages had a ladder of more abstract and more concredte languages but it seemed it was just an issue of translating one language to the other. The primitive “takes orders” capacity seemed mysterious how it could ever appear or be explained in the hierachy. The beauty of learning what a primitive computer was like is in that none of the parts “take orders”, its the software that is done entirely in hardware.
But processors are extrenally driven. For agents I suspect the core property is auto-poesis ie being run from signals emerging from within. Circuits will do some computation when excited but then “sleep” if the enviornemnt is not actively pushing in. Computers can keep up the excitation but will do essentially the same pattern unless disturbed from outside. Agents are the things that keep on changing their pattern even if the environment leaves them alone (or their evolution is because of the echo they make into the environment).
There is a sense in which agency is a fundamental concept. Before we can talk about physics, we need to talk about metaphysics (what is a “theory of physics”? how do we know which theories are true and which are false?). My best guess theory of metaphysics is infra-Bayesian physicalism (IBP), where agency is a central pillar: we need to talk about hypotheses of the agent, and counterfactual policies of the agent. It also looks like epistemic rationality is inseparable from instrumental rational: it’s impossible to do metaphysics without also doing decision theory.
Does this refute reductionist materialism? Well, it depends how you define “reductionist materialism”. There is a sense in which IBP is very harmonious with reductionist materialism, because each hypothesis talks about the universe from a “bird’s eye view”, without referring to the relationship of the agent with the universe (this relationship turns out to be possible to infer using the agent’s knowledge of its own source code), or even assuming any agent exists inside the universe described by the hypothesis. But, the agent is still implicit in the “whose hypothesis”.
Once we accept the “viewpoint agent” (i.e. the agent who hypothesizes/infers/decides) as fundamental, we can still ask, what about other agents? The answer is: other agents are programs with high value of g (see Definition 1.6 in the IBP article) which the universe is “running” (this is a well-defined thing in IBP). In this sense, other agents are sort of like rocks: emergent from the fundamental reductionist description of the universe. However, there’s a nuance: this reductionist description of the universe is a belief of the viewpoint agent. The fact it is a belief (formalized as a homogeneous ultradistribution) is crucial in the definition. So, once again, we cannot eliminate agency from the picture.
The silver lining is that, even though the concept of which programs are running is defined using beliefs, i.e. requires a subjective ontology, it seems likely different agents inhabiting the same universe can agree on it (see subsection “are manifest facts objective” in the IBP article), so there is a sense in which it is objective after all. Decide for yourself whether to call this “reductionist materialism”.
Since you switched the moderation to “easy-going”...
I have hinted at a definition in an old post https://www.lesswrong.com/posts/NptifNqFw4wT4MuY8/agency-is-bugs-and-uncertainty. Basically we use agency as a black-box description of something.
Of course, as generally agreed, agency is a convenient intentional stance model. There is no agency in a physical gears-level description of a system.
To build it up from the first principles, we must start with a compressible (not fully random) universe, at a minimum, because “embedded agents”, whatever they might turn out to be, are defined by having a somewhat accurate (i.e. lossily compressed) internal model of the world, so some degree of compressibility is required. (Though maybe useful lossy compression of a random stream is a thing, I don’t know.)
Next, one would identify some persistent features of the world that look like they convert free energy into entropy (note that a lot of “natural” systems behave like that, say, stars).
Finally, merging the two, a feature of the world that contains what appears to be a miniature model of the (relevant part of the) world, which also converts energy into entropy to persist the model and “itself” would be sort of close to an “agent”.
There are plenty of holes in this outline, but at least there is no circularity, as far as I can tell.
One possible definition is to look for things which are more optimized than any simple mechanism you can imagine for performing a task. So, e.g. Kasparov is great at playing chess, as an amateur you can verify this by noting that any plan you can come up with will tend to do worse than Kasparov’s plans(with high probability). In some sense this is an observer-relative definition, but it can be made more objective by considering the minimally-complex program that can match a given level of performance on a task, parameterized by e.g. Levin complexity. See this comment.
“optimised agent” appears not to be a tautology.