I suppose for messy real-world tasks, you can’t define distances objectively ahead of time. You could simply check a random 10 (x,f(x)) and choose how much to pay. In an ideal world, if they think you’re being unfair they can stop working for you. In this world where giving someone a job is a favor, they could go to a judge to have your judgement checked.
Though if we’re talking about AIs: You could have the AI output a probability distribution g(x) over possible f(x) for each of the 100 x. Then for a random 10 x, you generate an f(x) and reward the AI according to how much probability it assigned to what you generated.
Then I have a better answer for the question about how I would Goodhart things.
Let U_Gurkenglas = the set of all possible x that may be randomly checked as you described
Let U = the set of all possible metrics in the real world (superset of U_G)
For a given action A, optimize for U_G and ignore the set of metrics in U that are not in U_G.
You will be unhappy to the degree that the ignored subset contains things that you wish were in U_G. But until you catch on, you will be completely satisfied by the perfect application of all x in U_G.
To put this b concrete terms, if you don’t have a metric for “nitrous oxide emissions” because it’s the 1800s, then you won’t have any way to disincentivize an employee who races around the countryside driving a diesel truck that ruins the air.
(the mobile editor doesn’t have any syntax help ; I’ll fix formatting later)
That’s a bad example. In the 1800s no matter of caring would have resulted in the person chosing a car with less nitrous oxide because that wasn’t in the things people thought about.
I suppose for messy real-world tasks, you can’t define distances objectively ahead of time. You could simply check a random 10 (x,f(x)) and choose how much to pay. In an ideal world, if they think you’re being unfair they can stop working for you. In this world where giving someone a job is a favor, they could go to a judge to have your judgement checked.
Though if we’re talking about AIs: You could have the AI output a probability distribution g(x) over possible f(x) for each of the 100 x. Then for a random 10 x, you generate an f(x) and reward the AI according to how much probability it assigned to what you generated.
Then I have a better answer for the question about how I would Goodhart things.
Let U_Gurkenglas = the set of all possible x that may be randomly checked as you described Let U = the set of all possible metrics in the real world (superset of U_G) For a given action A, optimize for U_G and ignore the set of metrics in U that are not in U_G.
You will be unhappy to the degree that the ignored subset contains things that you wish were in U_G. But until you catch on, you will be completely satisfied by the perfect application of all x in U_G.
To put this b concrete terms, if you don’t have a metric for “nitrous oxide emissions” because it’s the 1800s, then you won’t have any way to disincentivize an employee who races around the countryside driving a diesel truck that ruins the air.
(the mobile editor doesn’t have any syntax help ; I’ll fix formatting later)
That’s a bad example. In the 1800s no matter of caring would have resulted in the person chosing a car with less nitrous oxide because that wasn’t in the things people thought about.