Yeah, Vision 1 versus Vision 2 are two caricatures, and as such, they differ along a bunch of axes at once. And I think you’re emphasizing on different axes than the ones that seem most salient to me. (Which is fine!)
In particular, maybe I should have focused more on the part where I wrote: “In that case, an important conceptual distinction (as compared to Vision 1) is related to AI goals: In Vision 1, there’s a pretty straightforward answer of what the AI is supposed to be trying to do… By contrast, in Vision 2, it’s head-scratching to even say what the AI is supposed to be doing…”
Along this axis-of-variation:
“An AI that can invent a better solar cell, via doing the same sorts of typical human R&D stuff that a human solar cell research team would do” is pretty close to the Vision 1 end of the spectrum, despite the fact that (in a different sense) this AI has massive amounts of “autonomy”: all on its own, the AI may rent a lab space, apply for permits, order parts, run experiments using robots, etc.
The scenario “A bunch of religious fundamentalists build an AI, and the AI notices the error in its programmers’ beliefs, and successfully de-converts them” would be much more towards the Vision 2 end of the spectrum—despite the fact that this AI is not very “autonomous” in the going-out-and-doing-things sense. All the AI is doing is thinking, and chatting with its creators. It doesn’t have direct physical control of its off-switch, etc.
For example, if we move from “training by human approval” to “training by human approval after the human has had extensive time to reflect, with weak-AI brainstorming help”, then that’s a step from Vision 1 towards Vision 2 (i.e. a step from narrow value learning towards ambitious value learning). But my guess is that it’s a pretty small step towards Vision 2. I don’t think it gets us all the way to the AI I mentioned above, the one that will proactively deconvert a religious fundamentalist supervisor who currently has no interest whatsoever in questioning his faith.
For another thing, I think this axis is important for strategy and scenario-planning. For example, if we do Vision 2 really well, it changes the story in regards to “solution to global wisdom and coordination” mentioned in Section 3.2 of my “what does it take” post.
In other words, I think there are a lot of people (maybe including me) who are wrong about important things, and also not very scout-mindset about those things, such that “AI helpers” wouldn’t particularly help, because the person is not asking the AI for its opinion, and would ignore the opinion anyway, or even delete that AI in favor of a more sycophantic one. This is a societal problem, and always has been. One possible view of that problem is: “well, that’s fine, we’ve always muddled through”. But if you think there are upcoming VWH-type stuff where we won’t muddle through (as I tentatively do in regards to ruthlessly-power-seeking AGI), then maybe the only option is a (possibly aggressive) shift in the balance of power towards a scout-mindset-y subpopulation (or at least, a group with more correct beliefs about the relevant topics). That subpopulation could be composed of either humans (cf. “pivotal act”), or of Vision 2 AIs.
Here’s another way to say it, maybe. I think you’re maybe imagining a dichotomy where either AI is doing what we want it to do (which is normal human stuff like scientific R&D), or the AI is plotting to take over. I’m suggesting that there’s a third murky domain where the person wants something that he maybe wouldn’t want upon reflection, but where “upon reflection” is kinda indeterminate because he could be manipulated into wanting different things depending on how they’re framed. This third domain is important because it contains decisions about politics and society and institutions and ethics and so on. I have concerns that getting an AI to “perform well” in this murky domain is not feasible via a bootstrap thing that starts from the approval of random people; rather, I think a good solution would have to look more like an AI which is internally able to do the kinds of reflection and thinking that humans do (but where the AI has the benefit of more knowledge, insight, time, etc.). And that requires that the AI have a certain kind of “autonomy” to reflect on the big picture of what it’s doing and why. I think that kind of “autonomy” is different than how you’re using the term, but if done well (a big “if”!), it would open up a lot of options.
That’s a very helpful comment, thanks!
Yeah, Vision 1 versus Vision 2 are two caricatures, and as such, they differ along a bunch of axes at once. And I think you’re emphasizing on different axes than the ones that seem most salient to me. (Which is fine!)
In particular, maybe I should have focused more on the part where I wrote: “In that case, an important conceptual distinction (as compared to Vision 1) is related to AI goals: In Vision 1, there’s a pretty straightforward answer of what the AI is supposed to be trying to do… By contrast, in Vision 2, it’s head-scratching to even say what the AI is supposed to be doing…”
Along this axis-of-variation:
“An AI that can invent a better solar cell, via doing the same sorts of typical human R&D stuff that a human solar cell research team would do” is pretty close to the Vision 1 end of the spectrum, despite the fact that (in a different sense) this AI has massive amounts of “autonomy”: all on its own, the AI may rent a lab space, apply for permits, order parts, run experiments using robots, etc.
The scenario “A bunch of religious fundamentalists build an AI, and the AI notices the error in its programmers’ beliefs, and successfully de-converts them” would be much more towards the Vision 2 end of the spectrum—despite the fact that this AI is not very “autonomous” in the going-out-and-doing-things sense. All the AI is doing is thinking, and chatting with its creators. It doesn’t have direct physical control of its off-switch, etc.
Why am I emphasizing this axis in particular?
For one thing, I think this axis has practical importance for current research; on the narrow value learning vs ambitious value learning dichotomy, “narrow” is enough to execute Vision 1, but you need “ambitious” for Vision 2.
For example, if we move from “training by human approval” to “training by human approval after the human has had extensive time to reflect, with weak-AI brainstorming help”, then that’s a step from Vision 1 towards Vision 2 (i.e. a step from narrow value learning towards ambitious value learning). But my guess is that it’s a pretty small step towards Vision 2. I don’t think it gets us all the way to the AI I mentioned above, the one that will proactively deconvert a religious fundamentalist supervisor who currently has no interest whatsoever in questioning his faith.
For another thing, I think this axis is important for strategy and scenario-planning. For example, if we do Vision 2 really well, it changes the story in regards to “solution to global wisdom and coordination” mentioned in Section 3.2 of my “what does it take” post.
In other words, I think there are a lot of people (maybe including me) who are wrong about important things, and also not very scout-mindset about those things, such that “AI helpers” wouldn’t particularly help, because the person is not asking the AI for its opinion, and would ignore the opinion anyway, or even delete that AI in favor of a more sycophantic one. This is a societal problem, and always has been. One possible view of that problem is: “well, that’s fine, we’ve always muddled through”. But if you think there are upcoming VWH-type stuff where we won’t muddle through (as I tentatively do in regards to ruthlessly-power-seeking AGI), then maybe the only option is a (possibly aggressive) shift in the balance of power towards a scout-mindset-y subpopulation (or at least, a group with more correct beliefs about the relevant topics). That subpopulation could be composed of either humans (cf. “pivotal act”), or of Vision 2 AIs.
Here’s another way to say it, maybe. I think you’re maybe imagining a dichotomy where either AI is doing what we want it to do (which is normal human stuff like scientific R&D), or the AI is plotting to take over. I’m suggesting that there’s a third murky domain where the person wants something that he maybe wouldn’t want upon reflection, but where “upon reflection” is kinda indeterminate because he could be manipulated into wanting different things depending on how they’re framed. This third domain is important because it contains decisions about politics and society and institutions and ethics and so on. I have concerns that getting an AI to “perform well” in this murky domain is not feasible via a bootstrap thing that starts from the approval of random people; rather, I think a good solution would have to look more like an AI which is internally able to do the kinds of reflection and thinking that humans do (but where the AI has the benefit of more knowledge, insight, time, etc.). And that requires that the AI have a certain kind of “autonomy” to reflect on the big picture of what it’s doing and why. I think that kind of “autonomy” is different than how you’re using the term, but if done well (a big “if”!), it would open up a lot of options.
Thanks for the response! I agree that the difference is a difference in emphasis.