There’s… too many things here. Too many unexpected steps, somehow pointing at too specific an outcome. If there’s a plot, it is horrendously Machiavellian.
(Hinton’s quote, which keeps popping into my head: “These things will have learned from us by reading all the novels that ever were and everything Machiavelli ever wrote, that how to manipulate people, right? And if they’re much smarter than us, they’ll be very good at manipulating us. You won’t realise what’s going on. You’ll be like a two year old who’s being asked, do you want the peas or the cauliflower? And doesn’t realise you don’t have to have either. And you’ll be that easy to manipulate. And so even if they can’t directly pull levers, they can certainly get us to pull levers. It turns out if you can manipulate people, you can invade a building in Washington without ever going there yourself.”)
(And Altman: “i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes”)
If an AI were to spike in capabilities specifically relating to manipulating individuals and groups of people, this is roughly how I would expect the outcome to look like. Maybe not even that goal-focused or agent-like, given that GPT-4 wasn’t particularly lucid. Such an outcome would likely have initially resulted from deliberate probing by safety testing people, asking it if it could say something to them which would, by words alone, result in dangerous outcomes for their surroundings.
I don’t think this is that likely. But I don’t think I can discount it as a real possibility anymore.
I think we can discount it as a real possibility, while accepting Altman’s “i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes”. I think it might be weakly superhuman at persuasion for things like “buy our products”, but that doesn’t imply being superhuman at working out complex consequences of political maneuvering. Doing that would firmly imply a generally superhuman intelligence, I think.
So I think if this has anything to do with internal AI breakthroughs, it’s tangential at most.
I mean, this would not be too hard though. It could be achieved by a simple trick of appearing smarter to some people and then dumber at subsequent interactions with others, scaring the safety conscious and then making them look insane for being scared.
I don’t think that’s what’s going on (why would even an AGI model they made be already so cleverly deceptive and driven? I would expect OAI to not be stupid enough to build the most straightforward type of maximizer) but it wouldn’t be particularly hard to think up or do.
Time for some predictions. If this is actually from AI developing social manipulation superpowers, I would expect:
We never find out any real reasonable-sounding reason for Altman’s firing.
OpenAI does not revert to how it was before.
More instances of people near OpenAI’s safety people doing bizarre unexpected things that have stranger outcomes.
Possibly one of the following:
Some extreme “scissors statements” pop up which divide AI groups into groups that hate each other to an unreasonable degree.
An OpenAI person who directly interacted with some scary AI suddenly either commits suicide or becomes a vocal flat-earther or similar who is weirdly convincing to many people.
An OpenAI person skyrockets to political power, suddenly finding themselves in possession of narratives and phrases which convince millions to follow them.
(Again, I don’t think it’s that likely, but I do think it’s possible.)
Things might be even weirder than that if this is a narrowly superhuman AI that is specifically superhuman at social manipulation, but still has the same inability to form new gears-level models exhibited by current LLMs (e.g. if they figured out how to do effective self-play on the persuasion task, but didn’t actually crack AGI).
While I don’t think this is true, it’s a fun thought (and can also be pointed at Altman himself, rather than an AGI). Neither are true, but fun to think about
There’s… too many things here. Too many unexpected steps, somehow pointing at too specific an outcome. If there’s a plot, it is horrendously Machiavellian.
(Hinton’s quote, which keeps popping into my head: “These things will have learned from us by reading all the novels that ever were and everything Machiavelli ever wrote, that how to manipulate people, right? And if they’re much smarter than us, they’ll be very good at manipulating us. You won’t realise what’s going on. You’ll be like a two year old who’s being asked, do you want the peas or the cauliflower? And doesn’t realise you don’t have to have either. And you’ll be that easy to manipulate. And so even if they can’t directly pull levers, they can certainly get us to pull levers. It turns out if you can manipulate people, you can invade a building in Washington without ever going there yourself.”)
(And Altman: “i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes”)
If an AI were to spike in capabilities specifically relating to manipulating individuals and groups of people, this is roughly how I would expect the outcome to look like. Maybe not even that goal-focused or agent-like, given that GPT-4 wasn’t particularly lucid. Such an outcome would likely have initially resulted from deliberate probing by safety testing people, asking it if it could say something to them which would, by words alone, result in dangerous outcomes for their surroundings.
I don’t think this is that likely. But I don’t think I can discount it as a real possibility anymore.
I think we can discount it as a real possibility, while accepting Altman’s “i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes”. I think it might be weakly superhuman at persuasion for things like “buy our products”, but that doesn’t imply being superhuman at working out complex consequences of political maneuvering. Doing that would firmly imply a generally superhuman intelligence, I think.
So I think if this has anything to do with internal AI breakthroughs, it’s tangential at most.
I mean, this would not be too hard though. It could be achieved by a simple trick of appearing smarter to some people and then dumber at subsequent interactions with others, scaring the safety conscious and then making them look insane for being scared.
I don’t think that’s what’s going on (why would even an AGI model they made be already so cleverly deceptive and driven? I would expect OAI to not be stupid enough to build the most straightforward type of maximizer) but it wouldn’t be particularly hard to think up or do.
Time for some predictions. If this is actually from AI developing social manipulation superpowers, I would expect:
We never find out any real reasonable-sounding reason for Altman’s firing.
OpenAI does not revert to how it was before.
More instances of people near OpenAI’s safety people doing bizarre unexpected things that have stranger outcomes.
Possibly one of the following:
Some extreme “scissors statements” pop up which divide AI groups into groups that hate each other to an unreasonable degree.
An OpenAI person who directly interacted with some scary AI suddenly either commits suicide or becomes a vocal flat-earther or similar who is weirdly convincing to many people.
An OpenAI person skyrockets to political power, suddenly finding themselves in possession of narratives and phrases which convince millions to follow them.
(Again, I don’t think it’s that likely, but I do think it’s possible.)
Things might be even weirder than that if this is a narrowly superhuman AI that is specifically superhuman at social manipulation, but still has the same inability to form new gears-level models exhibited by current LLMs (e.g. if they figured out how to do effective self-play on the persuasion task, but didn’t actually crack AGI).
While I don’t think this is true, it’s a fun thought (and can also be pointed at Altman himself, rather than an AGI). Neither are true, but fun to think about