Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.
My sense from reading Inflection’s response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don’t seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:
Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.
Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.
I think AIs thinking specifically about human psychology—and how to convince people to change their thoughts and behaviors—are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn’t seem to have shown up in their response.
I don’t think this type of AI is very useful for closing the acute risk window, and so probably shouldn’t be made until much later.
I think AIs thinking specifically about human psychology—and how to convince people to change their thoughts and behaviors—are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high,
I’d be interested in hearing some more reasoning about that; until reading this, my understanding of psychology-focused AI was that the operators were generally safe, and were as likely to be goodharted/deceived by the AI (or the humans being influenced) as with any other thing, and therefore inner alignment risk (which I understand less well) would become acute at around the same time as non-psychology AI. Maybe I’m displaying flawed thinking that’s prevalent among people like me, who spend orders of magnitude more time thinking about contemporary psychology systems than AI risk itself. Are you thinking that psychology-focused AI would notice the existence of their operators sooner than non-psychology AI? Or is it more about influence AI that people deliberately point at themselves instead of others?
and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn’t seem to have shown up in their response.
I don’t think this type of AI is very useful for closing the acute risk window, and so probably shouldn’t be made until much later.
Are you thinking that psychology-focused AI would notice the existence of their operators sooner than non-psychology AI? Or is it more about influence AI that people deliberately point at themselves instead of others?
I am mostly thinking about the former; I am worried that psychology-focused AI will develop more advanced theory of mind and be able to hide going rogue from operators/users more effectively, develop situational awareness more quickly, and so on.
I currently predict that the AI safety community is best off picking its battles and should not try to interfere with technologies that are as directly critical to national security as psychology AI is;
My view is that the AI takeover problem is fundamentally a ‘security’ problem. Building a robot army/police force has lots of benefits (I prefer it to a human one in many ways) but it means it’s that much easier for a rogue AI to seize control; a counter-terrorism AI also can be used against domestic opponents (including ones worried about the AI), and so on. I think jumping the gun on these sorts of things is more dangerous than jumping the gun on non-security uses (yes, you could use a fleet of self-driving cars to help you in a takeover, but it’d be much harder than a fleet of self-driving missile platforms).
Sorry, my bad. When I said “critical to national security”, I meant that the US and China probably already see psychology AI as critical to state survival. It’s not like it’s a good thing for this tech to be developed (idk what Bostrom/FHI was thinking when he wrote VWH in 2019), it’s just that the US and China are already in a state of moloch where they are worried about eachother (and Russia) using psychology AI which already exists to hack public opinion and pull the rug out from under the enemy regime. The NSA and CCP can’t resist developing psychological warfare/propaganda applications for SOTA AI systems, because psychology AI is also needed for defensively neutralizing/mitigating successful public opinion influence operations after they get through and turn millions of people (especially elites). As a result, it seems to me that the AI safety community should pick different battles than opposing psychological AI.
I don’t see how psychology-focused AI would develop better theory of mind than AI with tons of books in the training set. At the level where inner misalignment kills everyone, it seems like even something as powerful as the combination of social media and scrolling data would cause a dimmer awareness of humans than from the combination of physics and biology and evolution and history textbooks. I’d be happy to understand your thinking better since I don’t know much of the technical details of inner alignment or how psych AI is connected to that.
(I’m Matthew Gray)
My sense from reading Inflection’s response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don’t seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:
I think AIs thinking specifically about human psychology—and how to convince people to change their thoughts and behaviors—are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn’t seem to have shown up in their response.
I don’t think this type of AI is very useful for closing the acute risk window, and so probably shouldn’t be made until much later.
I’d be interested in hearing some more reasoning about that; until reading this, my understanding of psychology-focused AI was that the operators were generally safe, and were as likely to be goodharted/deceived by the AI (or the humans being influenced) as with any other thing, and therefore inner alignment risk (which I understand less well) would become acute at around the same time as non-psychology AI. Maybe I’m displaying flawed thinking that’s prevalent among people like me, who spend orders of magnitude more time thinking about contemporary psychology systems than AI risk itself. Are you thinking that psychology-focused AI would notice the existence of their operators sooner than non-psychology AI? Or is it more about influence AI that people deliberately point at themselves instead of others?
All the companies at this summit, not just inflection, are heavily invested in psychology AI (the companies, not the labs). I’ve also argued elsewhere that the American government and intelligence agencies American government and intelligence agencies are the primary actors with an interest in researching and deploying psychology AI.
I currently predict that the AI safety community is best off picking its battles and should not try to interfere with technologies that are as directly critical to national security as psychology AI is; it would be wiser to focus on reducing the attack surface of the AI safety community itself, and otherwise stay out of the way whenever possible since the alignment problem is obviously a much higher priority.
I am mostly thinking about the former; I am worried that psychology-focused AI will develop more advanced theory of mind and be able to hide going rogue from operators/users more effectively, develop situational awareness more quickly, and so on.
My view is that the AI takeover problem is fundamentally a ‘security’ problem. Building a robot army/police force has lots of benefits (I prefer it to a human one in many ways) but it means it’s that much easier for a rogue AI to seize control; a counter-terrorism AI also can be used against domestic opponents (including ones worried about the AI), and so on. I think jumping the gun on these sorts of things is more dangerous than jumping the gun on non-security uses (yes, you could use a fleet of self-driving cars to help you in a takeover, but it’d be much harder than a fleet of self-driving missile platforms).
Sorry, my bad. When I said “critical to national security”, I meant that the US and China probably already see psychology AI as critical to state survival. It’s not like it’s a good thing for this tech to be developed (idk what Bostrom/FHI was thinking when he wrote VWH in 2019), it’s just that the US and China are already in a state of moloch where they are worried about eachother (and Russia) using psychology AI which already exists to hack public opinion and pull the rug out from under the enemy regime. The NSA and CCP can’t resist developing psychological warfare/propaganda applications for SOTA AI systems, because psychology AI is also needed for defensively neutralizing/mitigating successful public opinion influence operations after they get through and turn millions of people (especially elites). As a result, it seems to me that the AI safety community should pick different battles than opposing psychological AI.
I don’t see how psychology-focused AI would develop better theory of mind than AI with tons of books in the training set. At the level where inner misalignment kills everyone, it seems like even something as powerful as the combination of social media and scrolling data would cause a dimmer awareness of humans than from the combination of physics and biology and evolution and history textbooks. I’d be happy to understand your thinking better since I don’t know much of the technical details of inner alignment or how psych AI is connected to that.