diegocaleiro comments on Superintelligence 13: Capability control methods

diegocaleiro 9 Dec 2014 4:17 UTC
4 points
When I’m faced with problems in which the principal agent problem is present, my take is usually that one should: 1) Pick three or more different metrics that correlate with what you want to measure: using the soviet classic example of a needle factory these could be a) Number of needles produced b) Weight of needles produced c) Average similarity between actual needle design and ideal needle. Then 2) Every time step where you test, you start by choosing one of the correlates at random, and then use that one to measure production.

This seems simple enough for soviet companies. You are still stuck with them trying to optimize those three metrics at the same time without optimizing production, but the more dimensions and degrees of orthogonality between them you find, the more you can be confident your system will be hard to cheat.

How do you think this would not work for AI?
- Vaniver 10 Dec 2014 1:56 UTC
  7 points
  Parent
  
  Then 2) Every time step where you test, you start by choosing one of the correlates at random, and then use that one to measure production.
  
  This is almost equivalent to having a linear function where you add the three metrics together. (It’s worse because it adds noise instead of averaging out noise.) Do you think adding the three together makes for a good metric, or might an optimization of that function fail because it makes crazy tradeoffs on an unconsidered dimension?
  - diegocaleiro 11 Dec 2014 0:07 UTC
    1 point
    Parent
    It may be too costly to detect (in proportion to the cost of arbitrarily deciding how to measure one against the other).
- 9eB1 9 Dec 2014 6:04 UTC
  −1 points
  Parent
  This is a surprisingly brilliant idea, which should definitely have a name. For humans, part of the benefit of this is that it appeals to risk aversion, so people wouldn’t want to completely write off one of the scenarios. It also makes it so complex to analyze, that many people would simply fall back to “doing the right thing” naturally. I could definitely foresee benefits by, for example, randomizing whether members of a team are going to be judged based on individual performance or team performance.
  
  I’m not totally sure it would work as well for AIs, which would naturally be trying to optimize in the gaps much more than a human would, and would potentially be less risk averse than a human.
  - diegocaleiro 9 Dec 2014 7:24 UTC
    1 point
    Parent
    Let’s name it:
    
    Hidden Agency Solution
    
    Caleiro Agency Conjecture
    
    Principal agent blind spot
    
    Cal-agency
    
    Stochastic principal agency solution.
    
    (wow, this naming thing is hard and awfully awkward, whether I’m optimizing for mnemonics or for fame—is there any other thing to optimize for here?)
    - Halfwitz 10 Dec 2014 1:10 UTC
      1 point
      Parent
      Fuzzy metrics?
      - diegocaleiro 11 Dec 2014 0:10 UTC
        1 point
        Parent
        Doesn’t refer to Principal Agency.