How does it come that those companies employ 1000s of software developers yet manage to do badly on the task of providing basic functions to users? What are all those engineers doing?
The basic functions from your perspective are different from the basic functions from the perspective of the company, basically, whose perspective is necessarily limited.
I’ve been considering a set of thoughts related to this for a while, which summarize, only slightly misleadingly, as “Caring is expensive.” By caring, I mean in a very general way, rather than in a specific way; the desire to make or do something that is better than what is required. It’s a very fuzzy concept, as I’m still working on trying to identify more precisely what it is I mean, so be warned that the ideas here are relatively unrefined.
Suppose you own a company, and you need to hire somebody to deal with office supplies; it’s expensive to hire somebody who actually cares whether the company pays $1.00 or $10.00 per stapler without that being a specific thing you expect of them; odds are, the person making that decision cares more about the ergonomics of the stapler, or the color. What’s very expensive is somebody who cares about all of that; it’s cheaper to hire somebody who will pay $1.00 for staplers, regardless of anything else, than to hire somebody who will actually make the decision you would make yourself, which takes into account both the price and the ergonomics (and maybe even the style) to make a decision that reflects a broad range of concerns.
The problem scales with the size of your company, as well; if you aren’t making the hiring decisions, you need to hire somebody who cares about whether the people they hire themselves care.
(Not to mention that the incentive structures, as the size scales up, start to work against caring.)
The approach a lot of companies take, instead, is to try to de-emphasize “caring” as much as possible, and emphasize metrics and goals, which are attached to incentives; that is, you maybe get paid more if you do certain things, and since you care about getting paid more, you by association care about the things you’re assigned to do. And these are invariably incomplete. Dagon mentions use cases; if somebody doesn’t define the use-case that includes what you are trying to do, the effect is that nobody is assigned to actually ensure that that use-case is handled. Nobody is assigned to “care” about that, so nobody does.
Couldn’t you make them care by making their pay dependent on how well they predict what you would decide, as measured by you redoing the decision for a representative sample of tasks?
How would you goodhart this metric? To be clear, you want to map x to f(x), but this takes a second of your time. You pay them to map x to f(x), but they map x to g(x). After they’re done mapping 100 x to g(x), you select a random 10 of those 100, spend 10 seconds to calculate the corresponding g(x)-f(x), and pay them more the smaller the absolute difference.
I’m trying to think through your point using the above stapler and office supplies example.
If you hired an aggressively lazy AI to buy office supplies for you and you told it :
Buy me a stapler.
Then it might buy a stapler from the nearest trash can in exchange for one speck of dirt.
Then, you would go to Staples and buy a new ergonomically friendly stapler that can handle 20 sheets (model XYZ from Brand ABC) using your credit card with free five day shipping.
You proposed that we would reward the AI by calculating the distance between the two sets of actions.
I don’t see a way for you to avoid having to invent a friendly AI to solve the problem.
Otherwise, you will inevitably leave out a metric (e.g. Oops, I didn’t have a metric for don’t-speed-on-roads so now the robot has a thousand dollar speeding ticket after optimizing for shipping speed).
I suppose for messy real-world tasks, you can’t define distances objectively ahead of time. You could simply check a random 10 (x,f(x)) and choose how much to pay. In an ideal world, if they think you’re being unfair they can stop working for you. In this world where giving someone a job is a favor, they could go to a judge to have your judgement checked.
Though if we’re talking about AIs: You could have the AI output a probability distribution g(x) over possible f(x) for each of the 100 x. Then for a random 10 x, you generate an f(x) and reward the AI according to how much probability it assigned to what you generated.
Then I have a better answer for the question about how I would Goodhart things.
Let U_Gurkenglas = the set of all possible x that may be randomly checked as you described
Let U = the set of all possible metrics in the real world (superset of U_G)
For a given action A, optimize for U_G and ignore the set of metrics in U that are not in U_G.
You will be unhappy to the degree that the ignored subset contains things that you wish were in U_G. But until you catch on, you will be completely satisfied by the perfect application of all x in U_G.
To put this b concrete terms, if you don’t have a metric for “nitrous oxide emissions” because it’s the 1800s, then you won’t have any way to disincentivize an employee who races around the countryside driving a diesel truck that ruins the air.
(the mobile editor doesn’t have any syntax help ; I’ll fix formatting later)
That’s a bad example. In the 1800s no matter of caring would have resulted in the person chosing a car with less nitrous oxide because that wasn’t in the things people thought about.
The basic functions from your perspective are different from the basic functions from the perspective of the company, basically, whose perspective is necessarily limited.
I’ve been considering a set of thoughts related to this for a while, which summarize, only slightly misleadingly, as “Caring is expensive.” By caring, I mean in a very general way, rather than in a specific way; the desire to make or do something that is better than what is required. It’s a very fuzzy concept, as I’m still working on trying to identify more precisely what it is I mean, so be warned that the ideas here are relatively unrefined.
Suppose you own a company, and you need to hire somebody to deal with office supplies; it’s expensive to hire somebody who actually cares whether the company pays $1.00 or $10.00 per stapler without that being a specific thing you expect of them; odds are, the person making that decision cares more about the ergonomics of the stapler, or the color. What’s very expensive is somebody who cares about all of that; it’s cheaper to hire somebody who will pay $1.00 for staplers, regardless of anything else, than to hire somebody who will actually make the decision you would make yourself, which takes into account both the price and the ergonomics (and maybe even the style) to make a decision that reflects a broad range of concerns.
The problem scales with the size of your company, as well; if you aren’t making the hiring decisions, you need to hire somebody who cares about whether the people they hire themselves care.
(Not to mention that the incentive structures, as the size scales up, start to work against caring.)
The approach a lot of companies take, instead, is to try to de-emphasize “caring” as much as possible, and emphasize metrics and goals, which are attached to incentives; that is, you maybe get paid more if you do certain things, and since you care about getting paid more, you by association care about the things you’re assigned to do. And these are invariably incomplete. Dagon mentions use cases; if somebody doesn’t define the use-case that includes what you are trying to do, the effect is that nobody is assigned to actually ensure that that use-case is handled. Nobody is assigned to “care” about that, so nobody does.
Couldn’t you make them care by making their pay dependent on how well they predict what you would decide, as measured by you redoing the decision for a representative sample of tasks?
Ability to predict is distinct from actual caring. It doesn’t get you around people trying to goodhart metrics.
How would you goodhart this metric? To be clear, you want to map x to f(x), but this takes a second of your time. You pay them to map x to f(x), but they map x to g(x). After they’re done mapping 100 x to g(x), you select a random 10 of those 100, spend 10 seconds to calculate the corresponding g(x)-f(x), and pay them more the smaller the absolute difference.
I’m trying to think through your point using the above stapler and office supplies example.
If you hired an aggressively lazy AI to buy office supplies for you and you told it : Buy me a stapler.
Then it might buy a stapler from the nearest trash can in exchange for one speck of dirt.
Then, you would go to Staples and buy a new ergonomically friendly stapler that can handle 20 sheets (model XYZ from Brand ABC) using your credit card with free five day shipping.
You proposed that we would reward the AI by calculating the distance between the two sets of actions.
I don’t see a way for you to avoid having to invent a friendly AI to solve the problem.
Otherwise, you will inevitably leave out a metric (e.g. Oops, I didn’t have a metric for don’t-speed-on-roads so now the robot has a thousand dollar speeding ticket after optimizing for shipping speed).
I suppose for messy real-world tasks, you can’t define distances objectively ahead of time. You could simply check a random 10 (x,f(x)) and choose how much to pay. In an ideal world, if they think you’re being unfair they can stop working for you. In this world where giving someone a job is a favor, they could go to a judge to have your judgement checked.
Though if we’re talking about AIs: You could have the AI output a probability distribution g(x) over possible f(x) for each of the 100 x. Then for a random 10 x, you generate an f(x) and reward the AI according to how much probability it assigned to what you generated.
Then I have a better answer for the question about how I would Goodhart things.
Let U_Gurkenglas = the set of all possible x that may be randomly checked as you described Let U = the set of all possible metrics in the real world (superset of U_G) For a given action A, optimize for U_G and ignore the set of metrics in U that are not in U_G.
You will be unhappy to the degree that the ignored subset contains things that you wish were in U_G. But until you catch on, you will be completely satisfied by the perfect application of all x in U_G.
To put this b concrete terms, if you don’t have a metric for “nitrous oxide emissions” because it’s the 1800s, then you won’t have any way to disincentivize an employee who races around the countryside driving a diesel truck that ruins the air.
(the mobile editor doesn’t have any syntax help ; I’ll fix formatting later)
That’s a bad example. In the 1800s no matter of caring would have resulted in the person chosing a car with less nitrous oxide because that wasn’t in the things people thought about.