There is a specific part of this problem that I’m very interested in and that is about looking at the boundaries of potential sub-agents. It feels like part of the goal here is to filter away potential “daemons” or inner optimisers so it feels kind of important to think of ways one can do this?
I can see how this project would be valuable even without it but do you have any thoughts about how you can differentiate between different parts of a system that’s acting like an agent to isolate the agentic part?
I otherwise find it a very interesting research direction.
Hm… so anything that measures degree of agent structure should register a policy with a sub-agent as having some agent structure. But yeah, I haven’t thought much about the scenarios where there are multiple agents inside the policy. The agent structure problem is trying to use performance to find a minimum measure of agent structure. So if there was an agent hiding in there that didn’t impact the performance during the measured time interval, then it wouldn’t be detected (although it would detect it “in the limit”).
That said, we’re not actually talking about how to measure degree of agent structure yet. It seems plausible to me that whatever method one uses to do that could be adapted to find multiple agents.
There is a specific part of this problem that I’m very interested in and that is about looking at the boundaries of potential sub-agents. It feels like part of the goal here is to filter away potential “daemons” or inner optimisers so it feels kind of important to think of ways one can do this?
I can see how this project would be valuable even without it but do you have any thoughts about how you can differentiate between different parts of a system that’s acting like an agent to isolate the agentic part?
I otherwise find it a very interesting research direction.
Hm… so anything that measures degree of agent structure should register a policy with a sub-agent as having some agent structure. But yeah, I haven’t thought much about the scenarios where there are multiple agents inside the policy. The agent structure problem is trying to use performance to find a minimum measure of agent structure. So if there was an agent hiding in there that didn’t impact the performance during the measured time interval, then it wouldn’t be detected (although it would detect it “in the limit”).
That said, we’re not actually talking about how to measure degree of agent structure yet. It seems plausible to me that whatever method one uses to do that could be adapted to find multiple agents.