Huh, I wonder what you think of a different way of splitting it up. Something like:
It’s a scientific possibility to have AI that’s on average better than humanity at the class of tasks “choose actions that achieve a goal in the real world.” Let’s label this by some superlative jargon like “superintelligent AI.” Such a technology would be hugely impactful.
It would be really bad if a superintelligent AI was choosing actions to achieve some goal, but this goal wasn’t beneficial to humans. There are several open problems that this means we need to solve before safely turning on any such AI.
We know enough that we can do useful work on (most of) these open problems right now. Arguing for this also implies that superintelligent AI is close enough (if not in years, then in “number of paradigm shifts”) that this work needs to start getting done.
We would expect a priori that work on these open problems of beneficial goal design should be under-prioritized (public goods problem, low immediate profit, not obvious you need it before you really need it). And indeed that seems to be the case (insert NIPS survey here), though there’s work going on at nonprofits that have different incentives. So consider thinking about this area if you’re looking for things to research.
I’m definitely interested in hearing other ways of splitting it up! This is one of the points of making this post. I’m also interested in what you think of the ways I’ve done the breakdown! Since you proposed an alternative, I guess you might have some thoughts on why it could be better :)
I see your points as being directed more at increasing ML researchers respect for AI x-risk work and their likelihood of doing relevant work. Maybe that should in fact be the goal. It seems to be a more common goal.
I would describe my goal (with this post, at least, and probably with most conversations I have with ML people about Xrisk) as something more like: “get them to understand the AI safety mindset, and where I’m coming from; get them to really think about the problem and engage with it”. I expect a lot of people here would reason in a very narrow and myopic consequentialist way that this is not as good a goal, but I’m unconvinced.
Well, you mentioned that a lot of people were getting off the train at point 1. My comment can be thought of as giving a much more thoroughly inside-view look at point 1, and deriving other stuff as incidental consequences.
I’m mentally working with an analogy to teaching people a new contra dance (if you don’t know what contra dancing is, I’m just talking about some sequence of dance moves). The teacher often has an abstract view of expression and flow that the students lack, and there’s a temptation for the teacher to try to share that view with the students. But the students don’t want to abstractions, what they want is concrete steps to follow, and good dancers will dance the dance just fine without ever hearing about the teacher’s abstract view. Before dancing they regard the abstractions as difficult to understand and distracting from the concrete instructions; they’ll be much more equipped to understand and appreciate them *after* dancing the dance.
IMO, this is a better way of splitting up the argument that we should be funding AI safety research than the one presented in the OP. My only gripe is in point 2. Many would argue that it wouldn’t be really bad for a variety of reasons, such as there are likely to be other ‘superintelligent AIs’ working in our favour. Alternatively, if the decision making were only marginally better than a human’s it wouldn’t be any worse than a small group of people working against humanity.
Huh, I wonder what you think of a different way of splitting it up. Something like:
It’s a scientific possibility to have AI that’s on average better than humanity at the class of tasks “choose actions that achieve a goal in the real world.” Let’s label this by some superlative jargon like “superintelligent AI.” Such a technology would be hugely impactful.
It would be really bad if a superintelligent AI was choosing actions to achieve some goal, but this goal wasn’t beneficial to humans. There are several open problems that this means we need to solve before safely turning on any such AI.
We know enough that we can do useful work on (most of) these open problems right now. Arguing for this also implies that superintelligent AI is close enough (if not in years, then in “number of paradigm shifts”) that this work needs to start getting done.
We would expect a priori that work on these open problems of beneficial goal design should be under-prioritized (public goods problem, low immediate profit, not obvious you need it before you really need it). And indeed that seems to be the case (insert NIPS survey here), though there’s work going on at nonprofits that have different incentives. So consider thinking about this area if you’re looking for things to research.
I’m definitely interested in hearing other ways of splitting it up! This is one of the points of making this post. I’m also interested in what you think of the ways I’ve done the breakdown! Since you proposed an alternative, I guess you might have some thoughts on why it could be better :)
I see your points as being directed more at increasing ML researchers respect for AI x-risk work and their likelihood of doing relevant work. Maybe that should in fact be the goal. It seems to be a more common goal.
I would describe my goal (with this post, at least, and probably with most conversations I have with ML people about Xrisk) as something more like: “get them to understand the AI safety mindset, and where I’m coming from; get them to really think about the problem and engage with it”. I expect a lot of people here would reason in a very narrow and myopic consequentialist way that this is not as good a goal, but I’m unconvinced.
Well, you mentioned that a lot of people were getting off the train at point 1. My comment can be thought of as giving a much more thoroughly inside-view look at point 1, and deriving other stuff as incidental consequences.
I’m mentally working with an analogy to teaching people a new contra dance (if you don’t know what contra dancing is, I’m just talking about some sequence of dance moves). The teacher often has an abstract view of expression and flow that the students lack, and there’s a temptation for the teacher to try to share that view with the students. But the students don’t want to abstractions, what they want is concrete steps to follow, and good dancers will dance the dance just fine without ever hearing about the teacher’s abstract view. Before dancing they regard the abstractions as difficult to understand and distracting from the concrete instructions; they’ll be much more equipped to understand and appreciate them *after* dancing the dance.
IMO, this is a better way of splitting up the argument that we should be funding AI safety research than the one presented in the OP. My only gripe is in point 2. Many would argue that it wouldn’t be really bad for a variety of reasons, such as there are likely to be other ‘superintelligent AIs’ working in our favour. Alternatively, if the decision making were only marginally better than a human’s it wouldn’t be any worse than a small group of people working against humanity.
TBC, I’m definitely NOT thinking of this as an argument for funding AI safety.