If it’s possible that we could get to a point where AGI is no longer a serious threat without needing to answer the question, then the question is not necessary.
Agreed, this seems like a good definition for rendering anything as ‘necessary.’
My claim is that answering these questions is probably necessary for achieving this goal—i.e., P(achieving goal | failing to think about one or more of these questions) ≈ 0. (I say, “I am claiming that a research agenda that neglects these questions would probably not actually be viable for the goal of AGI safety work.”)
That is, we would be exceedingly lucky if we achieve AGI safety’s goal without thinking about
what we mean when we say AGI (Q1),
what existential risks are likely to emerge from AGI (Q2),
how to address these risks (Q3),
how to implement these mitigation strategies (Q4), and
how quickly we actually need to answer these questions (Q5).
I really don’t see how it could be any other way: if we want to avoid futures in which AGI does bad stuff, we need to think about avoiding (Q3/Q4) the bad stuff (Q2) that AGI (Q1) might do (and we have to do this all “before the deadline;” Q5). I propose a way to do this hierarchically. Do you see wiggle room here where I do not?
FWIW, I also don’t really think this is the core claim of the sequence. I would want that to be something more like here is a useful framework for moving from point A (where the field is now) to point B (where the field ultimately wants to end up). I have not seen a highly compelling presentation of this sort of thing before, and I think it is very valuable in solving any hard problem to have a general end-to-end plan (which we probably will want to update as we go along; see Robert’s comment).
I think most of the strategies in MIRI’s general cluster do not depend on most of these questions.
Would you mind giving a specific example of an end-to-end AGI safety research agenda that you think does not depend on or attempt to address these questions? (I’m also happy to just continue this discussion off of LW, if you’d like.)
Would you mind giving a specific example of an end-to-end AGI safety research agenda that you think does not depend on or attempt to address these questions?
I think restricting oneself to end-to-end agendas is itself a mistake. One principle of e.g. the MIRI agenda is that we do not currently possess a strong enough understanding to create an end-to-end agenda which has any hope at all of working; anything which currently claims to be an end-to-end agenda is probably just ignoring the hard parts of the problem. (The Rocket Alignment Problem gives a good explanation of this view.)
I do think that finding necessary subquestions, or noticing that a given subquestion may not be necessary, is much easier than figuring out an end-to-end agenda. One can notice that e.g. an architecture-agnostic alignment strategy seems plausible (or arguably even necessary!) without figuring out all the steps of an end-to-end strategy.
Definitely agree that if we silo ourselves into any rigid plan now, it almost certainly won’t work. However, I don’t think ‘end-to-end agenda’ = ‘rigid plan.’ I certainly don’t think this sequence advocates anything like a rigid plan. These are the most general questions I could imagine guiding the field, and I’ve already noted that I think this should be a dynamic draft.
...we do not currently possess a strong enough understanding to create an end-to-end agenda which has any hope at all of working; anything which currently claims to be an end-to-end agenda is probably just ignoring the hard parts of the problem.
What hard parts of the problem do you think this sequence ignores?
(I explicitly claim throughout the sequence that what I propose is not sufficient, so I don’t think I can be accused of ignoring this.)
Hate to just copy and paste, but I still really don’t see how it could be any other way: if we want to avoid futures in which AGI does bad stuff, then we need to think about avoiding (Q3/Q4) the bad stuff (Q2) that AGI (Q1) might do (and we have to do this all “before the deadline;” Q5). This is basically tautological as far as I can tell. Do you agree or disagree with this if-then statement?
I do think that finding necessary subquestions, or noticing that a given subquestion may not be necessary, is much easier than figuring out an end-to-end agenda.
Agreed. My goal was to enumerate these questions. When I noticed that they followed a fairly natural progression, I decided to frame them hierarchically. And, I suppose to your point, it wasn’t necessarily easy to write this all up. I thought it would nonetheless be valuable to do so, so I did!
Thanks for linking the Rocket Alignment Problem—looking forward to giving it a closer read.
I still really don’t see how it could be any other way: if we want to avoid futures in which AGI does bad stuff, then we need to think about avoiding (Q3/Q4) the bad stuff (Q2) that AGI (Q1) might do (and we have to do this all “before the deadline;” Q5). This is basically tautological as far as I can tell. Do you agree or disagree with this if-then statement?
My comment at the top of this thread detailed my disagreement with that if-then statement, and I do not think any of your responses to my top-level comment actually justified the claim of necessity of the questions. Most of them made the same mistake, which I tried to emphasize in my response. This, for example:
How can you be so sure, however, that zooming into something narrower would effectively only add noise?
The question is not “How can John be so sure that zooming into something narrower would only add noise?”, the question is “How can Cameron be so sure that zooming into something narrower would yield crucial information without which we have no realistic hope of solving the problem?”.
I think this same issue applies to most of the rest of your replies to my original comment.
The question is not “How can John be so sure that zooming into something narrower would only add noise?”, the question is “How can Cameron be so sure that zooming into something narrower would yield crucial information without which we have no realistic hope of solving the problem?”.
I am not ‘so sure’—as I said in the previous comment, I have only claim(ed) it is probably necessary to, for instance, know more about AGI than just whether it is a ‘generic strong optimizer.’ I would only be comfortable making non-probabilistic claims about the necessity of particular questions in hindsight.
I don’t think I’m making some silly logical error. If your question is, “Why does Cameron think it is probably necessary to understand X if we want to have any realistic hope of solving the problem?”, well, I do not think this is rhetorical! I spend an entire post defending and elucidating each of these questions, and I hope by the end of the sequence, readers would have a very clear understanding of why I think each is probably necessary to think about (or I have failed as a communicator!).
It was never my goal to defend the (probable) necessity of each of the questions in this one post—this is the point of the wholesequence! This post is a glorified introductory paragraph.
I do not think, therefore, that this post serves as anything close to an adequate defense of this framework, and I understand your skepticism if you think this is all I will say about why these questions are important.
However, I don’t think your original comment—or any of this thread, for that matter—really addresses any of the important claims put forward in this sequence (which makes sense, given that I haven’t even published the whole thing yet!). It also seems like some of your skepticism is being fueled by assumptions about what you predict I will argue as opposed to what I will actually argue (correct me if I’m wrong!).
I hope you can find the time to actually read through the whole thing once it’s published before passing your final judgment. Taken as a whole, I think the sequence speaks for itself. If you still think it’s fundamentally bullshit after having read it, fair enough :)
Agreed, this seems like a good definition for rendering anything as ‘necessary.’
Our goal: minimize AGI-induced existential threats (right?).
My claim is that answering these questions is probably necessary for achieving this goal—i.e., P(achieving goal | failing to think about one or more of these questions) ≈ 0. (I say, “I am claiming that a research agenda that neglects these questions would probably not actually be viable for the goal of AGI safety work.”)
That is, we would be exceedingly lucky if we achieve AGI safety’s goal without thinking about
what we mean when we say AGI (Q1),
what existential risks are likely to emerge from AGI (Q2),
how to address these risks (Q3),
how to implement these mitigation strategies (Q4), and
how quickly we actually need to answer these questions (Q5).
I really don’t see how it could be any other way: if we want to avoid futures in which AGI does bad stuff, we need to think about avoiding (Q3/Q4) the bad stuff (Q2) that AGI (Q1) might do (and we have to do this all “before the deadline;” Q5). I propose a way to do this hierarchically. Do you see wiggle room here where I do not?
FWIW, I also don’t really think this is the core claim of the sequence. I would want that to be something more like here is a useful framework for moving from point A (where the field is now) to point B (where the field ultimately wants to end up). I have not seen a highly compelling presentation of this sort of thing before, and I think it is very valuable in solving any hard problem to have a general end-to-end plan (which we probably will want to update as we go along; see Robert’s comment).
Would you mind giving a specific example of an end-to-end AGI safety research agenda that you think does not depend on or attempt to address these questions? (I’m also happy to just continue this discussion off of LW, if you’d like.)
I think restricting oneself to end-to-end agendas is itself a mistake. One principle of e.g. the MIRI agenda is that we do not currently possess a strong enough understanding to create an end-to-end agenda which has any hope at all of working; anything which currently claims to be an end-to-end agenda is probably just ignoring the hard parts of the problem. (The Rocket Alignment Problem gives a good explanation of this view.)
I do think that finding necessary subquestions, or noticing that a given subquestion may not be necessary, is much easier than figuring out an end-to-end agenda. One can notice that e.g. an architecture-agnostic alignment strategy seems plausible (or arguably even necessary!) without figuring out all the steps of an end-to-end strategy.
Definitely agree that if we silo ourselves into any rigid plan now, it almost certainly won’t work. However, I don’t think ‘end-to-end agenda’ = ‘rigid plan.’ I certainly don’t think this sequence advocates anything like a rigid plan. These are the most general questions I could imagine guiding the field, and I’ve already noted that I think this should be a dynamic draft.
What hard parts of the problem do you think this sequence ignores?
(I explicitly claim throughout the sequence that what I propose is not sufficient, so I don’t think I can be accused of ignoring this.)
Hate to just copy and paste, but I still really don’t see how it could be any other way: if we want to avoid futures in which AGI does bad stuff, then we need to think about avoiding (Q3/Q4) the bad stuff (Q2) that AGI (Q1) might do (and we have to do this all “before the deadline;” Q5). This is basically tautological as far as I can tell. Do you agree or disagree with this if-then statement?
Agreed. My goal was to enumerate these questions. When I noticed that they followed a fairly natural progression, I decided to frame them hierarchically. And, I suppose to your point, it wasn’t necessarily easy to write this all up. I thought it would nonetheless be valuable to do so, so I did!
Thanks for linking the Rocket Alignment Problem—looking forward to giving it a closer read.
My comment at the top of this thread detailed my disagreement with that if-then statement, and I do not think any of your responses to my top-level comment actually justified the claim of necessity of the questions. Most of them made the same mistake, which I tried to emphasize in my response. This, for example:
The question is not “How can John be so sure that zooming into something narrower would only add noise?”, the question is “How can Cameron be so sure that zooming into something narrower would yield crucial information without which we have no realistic hope of solving the problem?”.
I think this same issue applies to most of the rest of your replies to my original comment.
I am not ‘so sure’—as I said in the previous comment, I have only claim(ed) it is probably necessary to, for instance, know more about AGI than just whether it is a ‘generic strong optimizer.’ I would only be comfortable making non-probabilistic claims about the necessity of particular questions in hindsight.
I don’t think I’m making some silly logical error. If your question is, “Why does Cameron think it is probably necessary to understand X if we want to have any realistic hope of solving the problem?”, well, I do not think this is rhetorical! I spend an entire post defending and elucidating each of these questions, and I hope by the end of the sequence, readers would have a very clear understanding of why I think each is probably necessary to think about (or I have failed as a communicator!).
It was never my goal to defend the (probable) necessity of each of the questions in this one post—this is the point of the whole sequence! This post is a glorified introductory paragraph.
I do not think, therefore, that this post serves as anything close to an adequate defense of this framework, and I understand your skepticism if you think this is all I will say about why these questions are important.
However, I don’t think your original comment—or any of this thread, for that matter—really addresses any of the important claims put forward in this sequence (which makes sense, given that I haven’t even published the whole thing yet!). It also seems like some of your skepticism is being fueled by assumptions about what you predict I will argue as opposed to what I will actually argue (correct me if I’m wrong!).
I hope you can find the time to actually read through the whole thing once it’s published before passing your final judgment. Taken as a whole, I think the sequence speaks for itself. If you still think it’s fundamentally bullshit after having read it, fair enough :)