This [Maximisers are dangerous] was the main thesis advanced by Yudkowsky and Bostrom when founding the field of AI safety.
[...]
And this proliferation of arguments is evidence against their quality: if your conclusions remain the same but your reasons for holding those conclusions change, that’s a warning sign for motivated cognition (especially when those beliefs are considered important in your social group).
I think many of the other arguments did appear in early discussions of AI safety, but perhaps later didn’t get written up clearly or get emphasized as much as “maximisers are dangerous”. I’d cite CEV as an AI safety idea that clearly took “human safety problems” strongly into consideration, and even before that, Yudkowsky wrote about the SysOp Scenario which would essentially replace physics with a different set of rules that would (in part) eliminate the potential vulnerabilities of actual physics. The early focus on creating a Singleton wasn’t just due to thinking that local intelligence explosion is highly likely but also because for reasons like ones in “prosaic alignment problem”, people (including me) thought a competitive multi-polar scenario might lead to unavoidably bad outcomes.
So I don’t think “your conclusions remain the same but your reasons for holding those conclusions change” is fair if it was meant to apply to Yudkowsky and Bostrom and others who have been involved in AI safety from the early days.
(I still think it’s great that you’re doing this work of untangling and explicating the different threads of argument for the importance of AI safety, but this part seems a bit unfair or at least could be interpreted that way.)
Apologies if this felt like it was targeted specifically at you and other early AI safety advocates, I have nothing but the greatest respect for your work. I’ll rewrite to clarify my intended meaning, which is more an attempt to evaluate the field as a whole. This is obviously a very vaguely-defined task, but let me take a stab at fleshing out some changes over the past decade:
1. There’s now much more concern about argument 2, the target loading problem (as well as inner optimisers, insofar as they’re distinct).
2. There’s now less focus on recursive self-improvement as a key reason why AI will be dangerous, and more focus on what happens when hardware scales up. Relatedly, I think a greater percentage of safety researchers believe that there’ll be a slow takeoff than used to be the case.
3. Argument 3 (prosaic AI alignment) is now considered more important and more tractable.
4. There’s now been significant criticism of coherence arguments as a reason to believe that AGI will pursue long-term goals in an insatiable maximising fashion.
I may be wrong about these shifts—I’m speaking as a newcomer to the field who has a very limited perspective on how it’s evolved over time. If so, I’d be happy to be corrected. If they have in fact occurred, here are some possible (non-exclusive) reasons why:
A. None of the proponents of the original arguments have changed their minds about the importance of those arguments, but new people came into the field because of those arguments, then disagreed with them and formulated new perspectives.
B. Some of the proponents of the original arguments have changed their minds significantly.
C. The proponents of the original arguments were misinterpreted, or overemphasised some of their beliefs at the expense of others, and actually these shifts are just a change in emphasis.
I think none of these options reflect badly on anyone involved (getting everything exactly right the first time is an absurdly high standard), but I think A and B would be weak evidence against the importance of AI safety (assuming you’ve already conditioned on the size of the field, etc). I also think that it’s great when individual people change their minds about things, and definitely don’t want to criticise that. But if the field as a whole does so (whatever that means), the dynamics of such a shift are worth examination.
I don’t have strong beliefs about the relative importance of A, B and C, although I would be rather surprised if any one of them were primarily responsible for all the shifts I mentioned above.
I think none of these options reflect badly on anyone involved (getting everything exactly right the first time is an absurdly high standard), but I think A and B would be weak evidence against the importance of AI safety (assuming you’ve already conditioned on the size of the field, etc).
That depend on how much A and B. Even if a field was actually important, it would have some nonzero amount of A and B, so A and B would constitute (even weak) evidence only if it was more than what you’d expect conditional on the field being important. I think the changes you described in the parent comment are real changes and are not entirely due to C, but they’re not more than the changes I’d expect to see conditional on AI safety being actually important. Do you have a different sense?
I don’t think it depends on how much A and B, because the “expected amount” is not a special point. In this context, the update that I made personally was “There are more shifts than I thought there were, therefore there’s probably more of A and B than I thought there was, therefore I should weakly update against AI safety being important.” Maybe (to make A and B more concrete) there being more shifts than I thought downgrades my opinion of the original arguments from “absolutely incredible” to “very very good”, which slightly downgrades my confidence that AI safety is important.
As a separate issue, conditional on the field being very important, I might expect the original arguments to be very very good, or I might expect them to be very good, or something else. But I don’t see how that expectation can prevent a change from “absolutely exceptional” to “very very good” from downgrading my confidence.
Ok, I think I misinterpreted when you said “I think A and B would be weak evidence against the importance of AI safety”. My current understanding is you’re saying that if you think there is more A and B (at a particular point in time) than you thought (for the same time period), then you should become less confident in the importance of AI safety (which I think makes sense). My previous interpretation was if you hadn’t updated on A and B yet (e.g., because you neglected to consider it as evidence, or because you left the field early before any A and B could have happened yet and then came back), then upon updating on the existence of A and B you should now be less confident of the importance of AI safety than you were.
Now that’s hopefully cleared up, I wonder how you used to see the history of the arguments for importance of AI safety and what (e.g., was there a paper or article that) made you think there are fewer shifts than there actually are.
C. The proponents of the original arguments were misinterpreted, or overemphasised some of their beliefs at the expense of others, and actually these shifts are just a change in emphasis.
My interpretation of what happened here is that more narrow AI successes made it more convincing that one could reach ASI by building all of the components of it directly, rather than necessitating building an AI that can do most of the hard work for you. If it only takes 5 cognitive modules to take over the world instead of 500, then one no longer needs to posit an extra mechanism by which a buildable system is able to reach the ability to take over the world. And so from my perspective it’s mostly a shift in emphasis, with small amounts of A and B as well.
I think many of the other arguments did appear in early discussions of AI safety, but perhaps later didn’t get written up clearly or get emphasized as much as “maximisers are dangerous”. I’d cite CEV as an AI safety idea that clearly took “human safety problems” strongly into consideration, and even before that, Yudkowsky wrote about the SysOp Scenario which would essentially replace physics with a different set of rules that would (in part) eliminate the potential vulnerabilities of actual physics. The early focus on creating a Singleton wasn’t just due to thinking that local intelligence explosion is highly likely but also because for reasons like ones in “prosaic alignment problem”, people (including me) thought a competitive multi-polar scenario might lead to unavoidably bad outcomes.
So I don’t think “your conclusions remain the same but your reasons for holding those conclusions change” is fair if it was meant to apply to Yudkowsky and Bostrom and others who have been involved in AI safety from the early days.
(I still think it’s great that you’re doing this work of untangling and explicating the different threads of argument for the importance of AI safety, but this part seems a bit unfair or at least could be interpreted that way.)
Apologies if this felt like it was targeted specifically at you and other early AI safety advocates, I have nothing but the greatest respect for your work. I’ll rewrite to clarify my intended meaning, which is more an attempt to evaluate the field as a whole. This is obviously a very vaguely-defined task, but let me take a stab at fleshing out some changes over the past decade:
1. There’s now much more concern about argument 2, the target loading problem (as well as inner optimisers, insofar as they’re distinct).
2. There’s now less focus on recursive self-improvement as a key reason why AI will be dangerous, and more focus on what happens when hardware scales up. Relatedly, I think a greater percentage of safety researchers believe that there’ll be a slow takeoff than used to be the case.
3. Argument 3 (prosaic AI alignment) is now considered more important and more tractable.
4. There’s now been significant criticism of coherence arguments as a reason to believe that AGI will pursue long-term goals in an insatiable maximising fashion.
I may be wrong about these shifts—I’m speaking as a newcomer to the field who has a very limited perspective on how it’s evolved over time. If so, I’d be happy to be corrected. If they have in fact occurred, here are some possible (non-exclusive) reasons why:
A. None of the proponents of the original arguments have changed their minds about the importance of those arguments, but new people came into the field because of those arguments, then disagreed with them and formulated new perspectives.
B. Some of the proponents of the original arguments have changed their minds significantly.
C. The proponents of the original arguments were misinterpreted, or overemphasised some of their beliefs at the expense of others, and actually these shifts are just a change in emphasis.
I think none of these options reflect badly on anyone involved (getting everything exactly right the first time is an absurdly high standard), but I think A and B would be weak evidence against the importance of AI safety (assuming you’ve already conditioned on the size of the field, etc). I also think that it’s great when individual people change their minds about things, and definitely don’t want to criticise that. But if the field as a whole does so (whatever that means), the dynamics of such a shift are worth examination.
I don’t have strong beliefs about the relative importance of A, B and C, although I would be rather surprised if any one of them were primarily responsible for all the shifts I mentioned above.
That depend on how much A and B. Even if a field was actually important, it would have some nonzero amount of A and B, so A and B would constitute (even weak) evidence only if it was more than what you’d expect conditional on the field being important. I think the changes you described in the parent comment are real changes and are not entirely due to C, but they’re not more than the changes I’d expect to see conditional on AI safety being actually important. Do you have a different sense?
I don’t think it depends on how much A and B, because the “expected amount” is not a special point. In this context, the update that I made personally was “There are more shifts than I thought there were, therefore there’s probably more of A and B than I thought there was, therefore I should weakly update against AI safety being important.” Maybe (to make A and B more concrete) there being more shifts than I thought downgrades my opinion of the original arguments from “absolutely incredible” to “very very good”, which slightly downgrades my confidence that AI safety is important.
As a separate issue, conditional on the field being very important, I might expect the original arguments to be very very good, or I might expect them to be very good, or something else. But I don’t see how that expectation can prevent a change from “absolutely exceptional” to “very very good” from downgrading my confidence.
Ok, I think I misinterpreted when you said “I think A and B would be weak evidence against the importance of AI safety”. My current understanding is you’re saying that if you think there is more A and B (at a particular point in time) than you thought (for the same time period), then you should become less confident in the importance of AI safety (which I think makes sense). My previous interpretation was if you hadn’t updated on A and B yet (e.g., because you neglected to consider it as evidence, or because you left the field early before any A and B could have happened yet and then came back), then upon updating on the existence of A and B you should now be less confident of the importance of AI safety than you were.
Now that’s hopefully cleared up, I wonder how you used to see the history of the arguments for importance of AI safety and what (e.g., was there a paper or article that) made you think there are fewer shifts than there actually are.
My interpretation of what happened here is that more narrow AI successes made it more convincing that one could reach ASI by building all of the components of it directly, rather than necessitating building an AI that can do most of the hard work for you. If it only takes 5 cognitive modules to take over the world instead of 500, then one no longer needs to posit an extra mechanism by which a buildable system is able to reach the ability to take over the world. And so from my perspective it’s mostly a shift in emphasis, with small amounts of A and B as well.