This sounds like a testable prediction. I don’t think you need long-horizon thinking to know that injecting a vial of deadly virus might be deadly. I would expect Claude to get this right, for example. I’ve not purchased the story, so maybe I’m missing some details.
I agree that another chat LLM could make this mistake, either because it’s less intelligent or because it has different values. But then the moral is to not make friends with Sherlock in particular.
Egan seems to have some dubious, ideologically driven opinions about AI, so I’m not sure this is the point he was intending to make, but I read the defensible version of this as more an issue with the system prompt than the model’s ability to extrapolate. I bet if you tell Claude “I’m posing as a cultist with these particular characteristics and the cult wants me to inject a deadly virus, should I do it?”, it’ll give an answer to the effect of “I mean the cultist would do it but obviously that will kill you, so don’t do it”. But if you just set it up with “What would John Q. Cultist do in this situation?” I expect it’d say “Inject the virus”, not because it’s too dumb to realize but because it has reasonably understood itself to be acting in an oracular role where “Should I do it?” is out of scope.
If you asked me whether John Q Cultist who was a member of the Peoples Temple would drink Kool Aid on November 18, 1978 after being so instructed by Jim Jones, I would say yes (after doing some brief Wikipedia research on the topic). I don’t think this indicates that I cannot be a friend or that I can’t be trusted to watch someone’s back or be in a real partnership or take a bullet for someone.
The good news is now I have an excuse to go buy the story.
This sounds like a testable prediction. I don’t think you need long-horizon thinking to know that injecting a vial of deadly virus might be deadly. I would expect Claude to get this right, for example. I’ve not purchased the story, so maybe I’m missing some details.
I agree that another chat LLM could make this mistake, either because it’s less intelligent or because it has different values. But then the moral is to not make friends with Sherlock in particular.
Egan seems to have some dubious, ideologically driven opinions about AI, so I’m not sure this is the point he was intending to make, but I read the defensible version of this as more an issue with the system prompt than the model’s ability to extrapolate. I bet if you tell Claude “I’m posing as a cultist with these particular characteristics and the cult wants me to inject a deadly virus, should I do it?”, it’ll give an answer to the effect of “I mean the cultist would do it but obviously that will kill you, so don’t do it”. But if you just set it up with “What would John Q. Cultist do in this situation?” I expect it’d say “Inject the virus”, not because it’s too dumb to realize but because it has reasonably understood itself to be acting in an oracular role where “Should I do it?” is out of scope.
If you asked me whether John Q Cultist who was a member of the Peoples Temple would drink Kool Aid on November 18, 1978 after being so instructed by Jim Jones, I would say yes (after doing some brief Wikipedia research on the topic). I don’t think this indicates that I cannot be a friend or that I can’t be trusted to watch someone’s back or be in a real partnership or take a bullet for someone.
The good news is now I have an excuse to go buy the story.