Then I have a better answer for the question about how I would Goodhart things.
Let U_Gurkenglas = the set of all possible x that may be randomly checked as you described
Let U = the set of all possible metrics in the real world (superset of U_G)
For a given action A, optimize for U_G and ignore the set of metrics in U that are not in U_G.
You will be unhappy to the degree that the ignored subset contains things that you wish were in U_G. But until you catch on, you will be completely satisfied by the perfect application of all x in U_G.
To put this b concrete terms, if you don’t have a metric for “nitrous oxide emissions” because it’s the 1800s, then you won’t have any way to disincentivize an employee who races around the countryside driving a diesel truck that ruins the air.
(the mobile editor doesn’t have any syntax help ; I’ll fix formatting later)
That’s a bad example. In the 1800s no matter of caring would have resulted in the person chosing a car with less nitrous oxide because that wasn’t in the things people thought about.
Then I have a better answer for the question about how I would Goodhart things.
Let U_Gurkenglas = the set of all possible x that may be randomly checked as you described Let U = the set of all possible metrics in the real world (superset of U_G) For a given action A, optimize for U_G and ignore the set of metrics in U that are not in U_G.
You will be unhappy to the degree that the ignored subset contains things that you wish were in U_G. But until you catch on, you will be completely satisfied by the perfect application of all x in U_G.
To put this b concrete terms, if you don’t have a metric for “nitrous oxide emissions” because it’s the 1800s, then you won’t have any way to disincentivize an employee who races around the countryside driving a diesel truck that ruins the air.
(the mobile editor doesn’t have any syntax help ; I’ll fix formatting later)
That’s a bad example. In the 1800s no matter of caring would have resulted in the person chosing a car with less nitrous oxide because that wasn’t in the things people thought about.