I do think that counterfactual impact is an important thing to track, although two people discovering something at the same time doesn’t seem like especially strong evidence that they were just “leading the parade.” It matters how large the set is. I.e., I doubt there were more than ~5 people around Newton’s time who could have come up with calculus. Creating things is just really hard, and I think often a pretty conjunctive set of factors needs to come together to make it happen (some of those are dispositional (ambition, intelligence, etc.), others are more like “was the groundwater there,” and others are like “did they even notice there was something worth doing here in the first place” etc).
Another way to say it is that there’s a reason only two people discovered calculus at the same time, and not tens, or hundreds. Why just two? A similar thing happened with Darwin, where Wallace came up with natural selection around the same time (they actually initially published it together). But having read a bunch about Darwin and that time period I feel fairly confident that they were the only two people “on the scent,” so to speak. Malthus influenced them both, as did living in England when the industrial revolution really took off (capitalism has a “survival of the fittest” vibe), so there was some groundwater there. But it was only these two who took that groundwater and did something powerful with it, and I don’t think there were that many other people around who could have. (One small piece of evidence that that effect: Origin of Species was published a year and a half after their initial publication, and no one else published anything on natural selection within that timespan, even after the initial idea was out there).
Also, I mostly agree about Shannon being more independent, although I do think that Turing was “on the scent” of information theory as well. E.g., from The Information: “Turing cared about the data that changed the probability: a probability factor, something like the weight of the evidence. He invented a unit he named a ‘ban.’ He found it convenient to use a logarithmic scale, so that bans would be added rather than multiplied. With a base of ten, a ban was the weight of evidence needed to make a fact ten times as likely.” This seems, to me, to veer pretty close to information theory and I think this is fairly common: a few people “on the scent,” i.e., noticing that there’s something interesting to discover somewhere, having the right questions in the first place, etc.—but only one or two who actually put in the right kind of effort to complete the idea.
There’s also something important to me about the opposite problem, which is how to assign blame when “someone else would have done it anyway.” E.g., as far as I can tell, much of Anthropic’s reasoning for why they’re not directly responsible for AI risk is because scaling is inevitable, i.e., that other labs would do it anyway. I don’t agree with them on the object-level claim (i.e., it seems possible to cause regulation to institute a pause), but even if I did, I still want to assign them blame for in fact being the ones taking the risky actions. This feels more true for me the fewer actors there are, i.e., at the point when there are only three big labs I think each of them is significantly contributing to risk, whereas if there were hundreds of leading labs I’d be less upset by any individual one. But there’s still a part of me that feels deontological about it, too—a sense that you’re just really not supposed to take actions that risky, no matter how inculpable you are counterfactually speaking.
Likewise, I have similar feelings about scientific discoveries. The people who did them are in fact the ones who did the work, and that matters to me. It matters more the smaller the set of possible people is, of course, but there’s some level upon which I want to be like “look they did an awesome thing here; it in fact wasn’t other people, and I want to assign them credit for that.” It’s related to a sense I have that doing great work is just really hard and that people perpetually underestimate this difficulty. For instance, people sometimes write off any good Musk has done (e.g., the good for climate change by creating Tesla, etc.) by saying “someone else would have made Tesla anyway” and I have to wonder, “really?” I certainly don’t look at the world and expect to see Teslas popping up everywhere. Likewise, I don’t look at the world and expect to see tons of leading AI labs, nor do I expect to see hundreds of people pushing the envelope on understanding what minds are. Few people try to do great things, and I think the set of people who might have done any particular great thing is often quite small.
Great comment Ajsja, you hit the mark. Two small comments:
(i) The ‘correct’, ‘mathematically righteous’ way to calculate credit is through a elaboration of counterfactual impact: the Shapley value. I believe it captures the things you want from credit that you write here.
(ii) On Turing being on the scent of information theory - I find this quote not that compelling. The idea of information as a logarithmic quantity was important but only a fraction of what Shannon did. In general, I agree with Schmidthuber’s assesment that Turing’ scientific stature is a little overrated.
A better comparison would probably be Ralph Hartley pioneered information-theoretic ideas (see e.g. the Shannon-Hartley theorem). I’m sure you know more about the history here than I do.
I’m certain one could write an entire book about the depth, significance and subtlety of Claude Shannon’s work. Perennially underrated.
Shapley seems like quite an arbitrary choice (why uniform over all coalitions?).
I think the actually mathematically right thing is just EDT/UDT, though this doesn’t imply a clear notion of credit. (Maximizing shapley yields crazy results.)
Unfortunately, I don’t think there is a correct notion of credit.
Averaging over all coalitions seems quite natural to me; it averages out the “incidental, contigent, unfair” factor of who got in what coalition first. But tastes may differ.
Shapley value has many other good properties nailing it down as a canonical way to allocate credit.
The Shapley value is uniquely determined by simple properties.
These properties:
Property 1: Sum of the values adds up to the total value (Efficiency)
Property 2: Equal agents have equal value (Symmetry)
Property 3: Order indifference: it doesn’t matter which order you go in (Linearity). Or, in other words, if there are two steps, Value(Step1 + Step2) = Value(Step1) + Value(Step2).
And an extra property:
Property 4: Null-player (if in every world, adding a person to the world has no impact, the person has no impact). You can either take this as an axiom, or derive it from the first three properties.
In the context of scientific contributions, one might argue that property 1 & 2 are very natural, axiomatic while property 3 is merely very reasonable.
I agree Shapley value per se isn’t the answer to all questions of credit. For instance, the Shapley value is not compositional: merging players into a single player doesn’t preserve Shapley values.
Nevertheless, I feel it is a very good idea that has many or all properties people want when they talk about a right notion of credit.
I don’t know what you mean by UDT/EDT in this context—I would be super curious if you could elucidate! :)
What do you mean by maximizing Shapley value gives crazy results? (as I point out above, Shapley value isn’t the be all and end all of all questions of credit and in e.g. hierarchichal composition of agency isn’t well-behaved).
I do think that counterfactual impact is an important thing to track, although two people discovering something at the same time doesn’t seem like especially strong evidence that they were just “leading the parade.” It matters how large the set is. I.e., I doubt there were more than ~5 people around Newton’s time who could have come up with calculus. Creating things is just really hard, and I think often a pretty conjunctive set of factors needs to come together to make it happen (some of those are dispositional (ambition, intelligence, etc.), others are more like “was the groundwater there,” and others are like “did they even notice there was something worth doing here in the first place” etc).
Another way to say it is that there’s a reason only two people discovered calculus at the same time, and not tens, or hundreds. Why just two? A similar thing happened with Darwin, where Wallace came up with natural selection around the same time (they actually initially published it together). But having read a bunch about Darwin and that time period I feel fairly confident that they were the only two people “on the scent,” so to speak. Malthus influenced them both, as did living in England when the industrial revolution really took off (capitalism has a “survival of the fittest” vibe), so there was some groundwater there. But it was only these two who took that groundwater and did something powerful with it, and I don’t think there were that many other people around who could have. (One small piece of evidence that that effect: Origin of Species was published a year and a half after their initial publication, and no one else published anything on natural selection within that timespan, even after the initial idea was out there).
Also, I mostly agree about Shannon being more independent, although I do think that Turing was “on the scent” of information theory as well. E.g., from The Information: “Turing cared about the data that changed the probability: a probability factor, something like the weight of the evidence. He invented a unit he named a ‘ban.’ He found it convenient to use a logarithmic scale, so that bans would be added rather than multiplied. With a base of ten, a ban was the weight of evidence needed to make a fact ten times as likely.” This seems, to me, to veer pretty close to information theory and I think this is fairly common: a few people “on the scent,” i.e., noticing that there’s something interesting to discover somewhere, having the right questions in the first place, etc.—but only one or two who actually put in the right kind of effort to complete the idea.
There’s also something important to me about the opposite problem, which is how to assign blame when “someone else would have done it anyway.” E.g., as far as I can tell, much of Anthropic’s reasoning for why they’re not directly responsible for AI risk is because scaling is inevitable, i.e., that other labs would do it anyway. I don’t agree with them on the object-level claim (i.e., it seems possible to cause regulation to institute a pause), but even if I did, I still want to assign them blame for in fact being the ones taking the risky actions. This feels more true for me the fewer actors there are, i.e., at the point when there are only three big labs I think each of them is significantly contributing to risk, whereas if there were hundreds of leading labs I’d be less upset by any individual one. But there’s still a part of me that feels deontological about it, too—a sense that you’re just really not supposed to take actions that risky, no matter how inculpable you are counterfactually speaking.
Likewise, I have similar feelings about scientific discoveries. The people who did them are in fact the ones who did the work, and that matters to me. It matters more the smaller the set of possible people is, of course, but there’s some level upon which I want to be like “look they did an awesome thing here; it in fact wasn’t other people, and I want to assign them credit for that.” It’s related to a sense I have that doing great work is just really hard and that people perpetually underestimate this difficulty. For instance, people sometimes write off any good Musk has done (e.g., the good for climate change by creating Tesla, etc.) by saying “someone else would have made Tesla anyway” and I have to wonder, “really?” I certainly don’t look at the world and expect to see Teslas popping up everywhere. Likewise, I don’t look at the world and expect to see tons of leading AI labs, nor do I expect to see hundreds of people pushing the envelope on understanding what minds are. Few people try to do great things, and I think the set of people who might have done any particular great thing is often quite small.
Don’t forget Edward Blyth for something in the vicinity of groundwater or on the scent.
Great comment Ajsja, you hit the mark. Two small comments:
(i) The ‘correct’, ‘mathematically righteous’ way to calculate credit is through a elaboration of counterfactual impact: the Shapley value. I believe it captures the things you want from credit that you write here.
(ii) On Turing being on the scent of information theory - I find this quote not that compelling. The idea of information as a logarithmic quantity was important but only a fraction of what Shannon did. In general, I agree with Schmidthuber’s assesment that Turing’ scientific stature is a little overrated.
A better comparison would probably be Ralph Hartley pioneered information-theoretic ideas (see e.g. the Shannon-Hartley theorem). I’m sure you know more about the history here than I do.
I’m certain one could write an entire book about the depth, significance and subtlety of Claude Shannon’s work. Perennially underrated.
Shapley seems like quite an arbitrary choice (why uniform over all coalitions?).
I think the actually mathematically right thing is just EDT/UDT, though this doesn’t imply a clear notion of credit. (Maximizing shapley yields crazy results.)
Unfortunately, I don’t think there is a correct notion of credit.
Averaging over all coalitions seems quite natural to me; it averages out the “incidental, contigent, unfair” factor of who got in what coalition first. But tastes may differ.
Shapley value has many other good properties nailing it down as a canonical way to allocate credit.
Quoting from nunoSempere’s article:
The Shapley value is uniquely determined by simple properties.
These properties:
Property 1: Sum of the values adds up to the total value (Efficiency)
Property 2: Equal agents have equal value (Symmetry)
Property 3: Order indifference: it doesn’t matter which order you go in (Linearity). Or, in other words, if there are two steps, Value(Step1 + Step2) = Value(Step1) + Value(Step2).
And an extra property:
Property 4: Null-player (if in every world, adding a person to the world has no impact, the person has no impact). You can either take this as an axiom, or derive it from the first three properties.
In the context of scientific contributions, one might argue that property 1 & 2 are very natural, axiomatic while property 3 is merely very reasonable.
I agree Shapley value per se isn’t the answer to all questions of credit. For instance, the Shapley value is not compositional: merging players into a single player doesn’t preserve Shapley values.
Nevertheless, I feel it is a very good idea that has many or all properties people want when they talk about a right notion of credit.
I don’t know what you mean by UDT/EDT in this context—I would be super curious if you could elucidate! :)
What do you mean by maximizing Shapley value gives crazy results? (as I point out above, Shapley value isn’t the be all and end all of all questions of credit and in e.g. hierarchichal composition of agency isn’t well-behaved).
Say more / references?