Shmi comments on New paper from MIRI: “Toward idealized decision theory”

Shmi Dec 17, 2014, 2:22 AM
2 points
Suppose you look back at some past events. Can you tell if a decision-theoretic agent was at work? If so, what are the signs of optimization having been applied?
- Kawoomba Dec 18, 2014, 7:48 AM
  3 points
  Parent
  Well, since an agent is embedded within its environment, the result of its optimizing must still conform to said environment’s rules. You can’t look for “impossible” results since the agent cannot achieve any, being as much a slave to the system as any non-agent process. So we’d need some measure of probable/natural result, to screen for improbable-to-be-natural outcomes? However, that would be circular reasoning, since to define what’s “improbable to occur without agent intervention” versus “probable to occur without agent intervention” would presuppose a way to detect agenty behavior.
  
  Best I can come up with ad-hoc would be a small subcomponent of the environment affecting a large part of the environment, shaping what the negentropy is used on. Such a “determining nexus” would be a prime candidate for an active decision-theoretic agent.
  
  However, this approach presuposes two conditions: 1) That the agent is powerful enough for its optimizations to [a|e]ffect its environment, and 2) that the agent’s goals demand a change in said environment. An agent which either isn’t powerful enough, or is content enough with the status quo to stay passive (whose utility function is in alignment with the environment’s detected state), cannot be detected in principle (or so I’d surmise).
  
  So what I propose is more like a low-sensitivity screening-test for agents than a fully general solution. I’m not sure what could be done, even in principle, to encompass agents not meeting criteria 1 and/or 2. Probably just define those as not-agents and be done with it ;-).
  
  In case of 2, you could simply manipulate the environment to see what elicits “a reaction” (in the “small part” affecting “large part” sense, especially if that “small part” was reverting the “large part” to the previous state, as much as the available negentropy post-manipulation would allow for in any case). For each manipulation which does not elicit such a reaction, a certain class of agent (that with a utility function that would have necessitated a reaction) could be ruled out.
  
  Really smart agents could try to evade such detection algorithms by embedding themselves more broadly. One could imagine an agent doing so until it is so ubiquituous that the gullible detectors think of it as a natural phenomenon and called it Higgs field, or somesuch. :-)
  
  I’m onto you.
  - Shmi Dec 18, 2014, 4:13 PM
    3 points
    Parent
    I asked the question originally because:
    
    it should be easier to analyze a now static configuration laid bare before you, like a video you can wind back and forth as desired, than predict something that hasn’t occurred yet.
    
    there should be a way to detect agency in retrospect, otherwise it’s no agency at all. For simplicity (an extremely underrated virtue on this forum) let’s take an agent which does not care about evading detection.
    
    Re your conditions 1 and 2, feel free to presuppose anything which makes the problem simpler, without cheating. By cheating I mean relying on human signs of agency, like known artifacts of humanity, such as spears or cars.
- [deleted]Dec 17, 2014, 12:55 PM
  1 point
  Parent
  A proper solution to this problem would be an optimal decision theory. Consider the decision itself as a random variable, then take some epistemic model of the world and infer, from a third-person point of view, what an optimal agent with certain knowledge and preferences should have done. Then output that decision.
  - Shmi Dec 17, 2014, 5:28 PM
    1 point
    Parent
    I am not talking about optimal at all. Just being able to forensically detect any sign of agency rather than… what?