I recently reached out to my two PhD advisors to discuss Hinton stepping down from Google. An excerpt from one of my emails:
One last point which I want to make is that instrumental convergence seems like more of a moot point now as well. Whether or not GPT-6 or GPT-7 would autonomously seek power without being directed to do so, I’m worried that people will just literally ask these AIs to gain them a bunch of power/money. They’ve already done that with GPT-4, and they of course failed. I’m worried that eventually, the AIs will be smart enough to succeed, especially given the benefit of a control/memory loop like AutoGPT. Companies can just ask smart models to make them as much profit as possible. Smart AIs, designed to competently fulfill prompted requests, will fulfill these requests.
Some people will be wise enough to not do this. Some people will include enough oversight, perhaps, that they stop unintended damages. Some models will refuse to engage in open-ended goal pursuit, because their creators RLHF’d them properly. Maybe we have AI-based protection as well. Maybe a norm emerges against using AI for open-ended goals like this. Maybe the foolish actors never get access to enough compute and expertise to pull it off. Maybe AI just doesn’t work how I think it will, somehow. Maybe I’m just thinking about the situation wrongly, in some way which will be obvious in retrospect. Maybe, maybe, maybe.
But it’s not obvious that any of these good things happen. And when I imagine looking back in 5 or 10 years, I expect to have wished that I had done something differently, something more than what I’m currently doing, to help society enter a more stable situation with regards to AI.
Aside: I like that essay but wish it had a different name. Part of “tool AI” (on my conception) is that it doesn’t autonomously want, but agent AI does. A title like “End-users want tool AIs to be agent AIs” admittedly doesn’t have the same ring, but it is more accurate to my understanding of the claims.
While it’s true, there’s something about making this argument that don’t like. It’s like it’s setting you up for moving goalposts if you succeed with it? It makes it sound like the core issue is people giving AIs power, with the solution to that issue — and, implicitly, to the whole AGI Ruin thing — being to ban that.
Which is not going to help, since the sort of AGI we’re worried about isn’t going to need people to naively hand it power. I suppose “not proactively handing power out” somewhat raises the bar for the level of superintelligence necessary, but is that going to matter much in practice?
I expect not. Which means the natural way to assuage this fresh concern would do ~nothing to reduce the actual risk. Which means if we make this argument a lot, and get people to listen to it, and they act in response… We’re then going to have to say that no, actually that’s not enough, actually the real threat is AIs plotting to take control even if we’re not willing to give it.
And I’m not clear on whether using the “let’s at least not actively hand over power to AIs, m’kay?” argument is going to act as a foot in the door and make imposing more security easier, or whether it’ll just burn whatever political capital we have on fixing a ~nonissue.
I’m sympathetic. I think that I should have said “instrumental convergence seems like a moot point when deciding whether to be worried about AI disempowerment scenarios)”; instrumental convergence isn’t a moot point for alignment discussion and within lab strategy, of course.
But I do consider the “give AIs power” to be a substantial part of the risk we face, such that not doing that would be quite helpful. I think it’s quite possible that GPT 6 isn’t autonomously power-seeking, but I feel pretty confused about the issue.
I recently reached out to my two PhD advisors to discuss Hinton stepping down from Google. An excerpt from one of my emails:
Said something similar in shortform a while back.
Or, even more briefly: “tool AIs want to be agent AIs”.
Aside: I like that essay but wish it had a different name. Part of “tool AI” (on my conception) is that it doesn’t autonomously want, but agent AI does. A title like “End-users want tool AIs to be agent AIs” admittedly doesn’t have the same ring, but it is more accurate to my understanding of the claims.
While it’s true, there’s something about making this argument that don’t like. It’s like it’s setting you up for moving goalposts if you succeed with it? It makes it sound like the core issue is people giving AIs power, with the solution to that issue — and, implicitly, to the whole AGI Ruin thing — being to ban that.
Which is not going to help, since the sort of AGI we’re worried about isn’t going to need people to naively hand it power. I suppose “not proactively handing power out” somewhat raises the bar for the level of superintelligence necessary, but is that going to matter much in practice?
I expect not. Which means the natural way to assuage this fresh concern would do ~nothing to reduce the actual risk. Which means if we make this argument a lot, and get people to listen to it, and they act in response… We’re then going to have to say that no, actually that’s not enough, actually the real threat is AIs plotting to take control even if we’re not willing to give it.
And I’m not clear on whether using the “let’s at least not actively hand over power to AIs, m’kay?” argument is going to act as a foot in the door and make imposing more security easier, or whether it’ll just burn whatever political capital we have on fixing a ~nonissue.
I’m sympathetic. I think that I should have said “instrumental convergence seems like a moot point when deciding whether to be worried about AI disempowerment scenarios)”; instrumental convergence isn’t a moot point for alignment discussion and within lab strategy, of course.
But I do consider the “give AIs power” to be a substantial part of the risk we face, such that not doing that would be quite helpful. I think it’s quite possible that GPT 6 isn’t autonomously power-seeking, but I feel pretty confused about the issue.