While I don’t think anyone aware of AI alignment issues should really update a lot because of Cicero, I’ve found this particular piece of news to be quite effective at making unaware people update toward “AI is scary”.
For me the scary part was Meta’s willingness to do things that are minimally/arguably torment-nexusy and then put it in PR language like “cooperation” and actually with a straight face sweep the deceptive capability under the rug.
This is different from believing that the deceptive capability in question is on it’s own dangerous or surprising.
My update from cicero is almost entirely on the social reality level: I now more strongly than before believe that in the social reality, rationalization for torment-nexus-ing will be extremely viable and accessible to careless actors.
(that said, I think I may have forecasted 30-45% chance of full-press diplomacy success if you had asked me a few weeks ago, so maybe I’m not that unsurprised on the technical level)
I’m a bit puzzled by these reactions. In a sense yes, this is technically teaching an AI to deceive humans… but in a super-limited context that doesn’t really generalize even to other versions of Diplomacy, let alone real life. To me, this is in principle teaching an AI to deceive, but only in a similar sense as having an AI in Civilization sometimes make a die roll to attack you despite having signed a peace treaty. (Analogous to Cicero sometimes attacks you despite having said that it won’t.) It’s a deception so removed from anything that’s relevant for real life, it makes little sense to even call it deception.
I interpret references to things like torment nexuses as implying that Meta knew this to be a bad idea and intentionally chose to go against the social consensus. But I think that’s a social consensus that requires a belief in very short timelines? As in, it requires you to think that we’re a very short way from AGI. In that case, anything that’s done with current-day AI may affect how AGI is developed, so this kind of a project actually has some real chance of conferring AGIs with the capabilities for deception.
But if you don’t believe in very short timelines (something on the order of 5 years or something), then the reasonable default assumption seems to be that this was a fun project done for the technical challenge but probably won’t affect future AGIs one way or the other. Because any genuine ability to deceive that AGIs could have and that would work in the real world would be connected to some more powerful social reasoning ability than this kind of language model tinkering.
Then if Meta doesn’t believe in very short timelines, then there’s no reason to attribute defection/”torment-nexus-ing” to Meta, since they didn’t do anything that could cause damage. They’re not doing a thing that everyone has warned about and said is a bad idea, they’re just doing a cute little toy. And my understanding is that Meta’s staff in general doesn’t believe in very short timelines.
My reaction has nothing to do with “allowing AI to deceive” and everything with “this is a striking example of AI reaching better than average human level at a game that integrates many different core capacities of general intelligences such has natural language, cooperation, bargaining, planning, etc.
Or too put it an other way : for the profane it is easy to think of GPT-3 or deepL or Dall-e as tools, but Cicero will feels more agentic to them.
While I don’t think anyone aware of AI alignment issues should really update a lot because of Cicero, I’ve found this particular piece of news to be quite effective at making unaware people update toward “AI is scary”.
For me the scary part was Meta’s willingness to do things that are minimally/arguably torment-nexusy and then put it in PR language like “cooperation” and actually with a straight face sweep the deceptive capability under the rug.
This is different from believing that the deceptive capability in question is on it’s own dangerous or surprising.
My update from cicero is almost entirely on the social reality level: I now more strongly than before believe that in the social reality, rationalization for torment-nexus-ing will be extremely viable and accessible to careless actors.
(that said, I think I may have forecasted 30-45% chance of full-press diplomacy success if you had asked me a few weeks ago, so maybe I’m not that unsurprised on the technical level)
I’m a bit puzzled by these reactions. In a sense yes, this is technically teaching an AI to deceive humans… but in a super-limited context that doesn’t really generalize even to other versions of Diplomacy, let alone real life. To me, this is in principle teaching an AI to deceive, but only in a similar sense as having an AI in Civilization sometimes make a die roll to attack you despite having signed a peace treaty. (Analogous to Cicero sometimes attacks you despite having said that it won’t.) It’s a deception so removed from anything that’s relevant for real life, it makes little sense to even call it deception.
I interpret references to things like torment nexuses as implying that Meta knew this to be a bad idea and intentionally chose to go against the social consensus. But I think that’s a social consensus that requires a belief in very short timelines? As in, it requires you to think that we’re a very short way from AGI. In that case, anything that’s done with current-day AI may affect how AGI is developed, so this kind of a project actually has some real chance of conferring AGIs with the capabilities for deception.
But if you don’t believe in very short timelines (something on the order of 5 years or something), then the reasonable default assumption seems to be that this was a fun project done for the technical challenge but probably won’t affect future AGIs one way or the other. Because any genuine ability to deceive that AGIs could have and that would work in the real world would be connected to some more powerful social reasoning ability than this kind of language model tinkering.
Then if Meta doesn’t believe in very short timelines, then there’s no reason to attribute defection/”torment-nexus-ing” to Meta, since they didn’t do anything that could cause damage. They’re not doing a thing that everyone has warned about and said is a bad idea, they’re just doing a cute little toy. And my understanding is that Meta’s staff in general doesn’t believe in very short timelines.
My reaction has nothing to do with “allowing AI to deceive” and everything with “this is a striking example of AI reaching better than average human level at a game that integrates many different core capacities of general intelligences such has natural language, cooperation, bargaining, planning, etc.
Or too put it an other way : for the profane it is easy to think of GPT-3 or deepL or Dall-e as tools, but Cicero will feels more agentic to them.