When I’m faced with problems in which the principal agent problem is present, my take is usually that one should: 1) Pick three or more different metrics that correlate with what you want to measure: using the soviet classic example of a needle factory these could be a) Number of needles produced b) Weight of needles produced c) Average similarity between actual needle design and ideal needle.
Then 2) Every time step where you test, you start by choosing one of the correlates at random, and then use that one to measure production.
This seems simple enough for soviet companies. You are still stuck with them trying to optimize those three metrics at the same time without optimizing production, but the more dimensions and degrees of orthogonality between them you find, the more you can be confident your system will be hard to cheat.
Then 2) Every time step where you test, you start by choosing one of the correlates at random, and then use that one to measure production.
This is almost equivalent to having a linear function where you add the three metrics together. (It’s worse because it adds noise instead of averaging out noise.) Do you think adding the three together makes for a good metric, or might an optimization of that function fail because it makes crazy tradeoffs on an unconsidered dimension?
This is a surprisingly brilliant idea, which should definitely have a name. For humans, part of the benefit of this is that it appeals to risk aversion, so people wouldn’t want to completely write off one of the scenarios. It also makes it so complex to analyze, that many people would simply fall back to “doing the right thing” naturally. I could definitely foresee benefits by, for example, randomizing whether members of a team are going to be judged based on individual performance or team performance.
I’m not totally sure it would work as well for AIs, which would naturally be trying to optimize in the gaps much more than a human would, and would potentially be less risk averse than a human.
When I’m faced with problems in which the principal agent problem is present, my take is usually that one should: 1) Pick three or more different metrics that correlate with what you want to measure: using the soviet classic example of a needle factory these could be a) Number of needles produced b) Weight of needles produced c) Average similarity between actual needle design and ideal needle. Then 2) Every time step where you test, you start by choosing one of the correlates at random, and then use that one to measure production.
This seems simple enough for soviet companies. You are still stuck with them trying to optimize those three metrics at the same time without optimizing production, but the more dimensions and degrees of orthogonality between them you find, the more you can be confident your system will be hard to cheat.
How do you think this would not work for AI?
This is almost equivalent to having a linear function where you add the three metrics together. (It’s worse because it adds noise instead of averaging out noise.) Do you think adding the three together makes for a good metric, or might an optimization of that function fail because it makes crazy tradeoffs on an unconsidered dimension?
It may be too costly to detect (in proportion to the cost of arbitrarily deciding how to measure one against the other).
This is a surprisingly brilliant idea, which should definitely have a name. For humans, part of the benefit of this is that it appeals to risk aversion, so people wouldn’t want to completely write off one of the scenarios. It also makes it so complex to analyze, that many people would simply fall back to “doing the right thing” naturally. I could definitely foresee benefits by, for example, randomizing whether members of a team are going to be judged based on individual performance or team performance.
I’m not totally sure it would work as well for AIs, which would naturally be trying to optimize in the gaps much more than a human would, and would potentially be less risk averse than a human.
Let’s name it:
Hidden Agency Solution
Caleiro Agency Conjecture
Principal agent blind spot
Cal-agency
Stochastic principal agency solution.
(wow, this naming thing is hard and awfully awkward, whether I’m optimizing for mnemonics or for fame—is there any other thing to optimize for here?)
Fuzzy metrics?
Doesn’t refer to Principal Agency.