Let’s say there’s a illiterate man that lives a simple life, and in doing so just happens to follow all the strictures of the law, without ever being able to explain what the law is. Would you say that this man understands the law?
Alternatively, let’s say there is a learned man that exhaustively studies the law, but only so he can bribe and steal and arson his way to as much crime as possible. Would you say that this man understands the law?
I would say that it is ambiguous whether the 1st man understands the law; maybe? kind of? you could make an argument I guess? it’s a bit of a weird way to put it innit? Whereas the 2nd man definitely understands the law. It sounds like you would say that the 1st man definitely understands the law (I’m not sure what you would say about the 2nd man), which might be where we have a difference.
I think you could say that LLMs don’t work that way, that the reader should intuitively know this and that the word “understanding” should be treated as being special in this context and should not be ambiguous at all; as I reader, I am saying I am confused by the choice of words, or at least this is not explained in enough detail ahead of time.
Obviously, I’m just one reader, maybe everyone else understood what you meant; grain of salt, and all that.
Some logical nits:
Early on you mention physical attacks to destroy offline backups; these attacks would be highly visible and would contradict the dark forest nature the scenario.
Perfect concealment and perfect attacks are in tension. The AI supposedly knows the structure and vulnerabilities of the systems hosting an enemy AI, but finding these things out for sure requires intrusion, which can be detected. The AI can hold off on attacking and work off of suppositions, but then the perfect attack is not guaranteed, and the attack could fail due to unknowns.
Other notes:
Why do you assume that AIs bias towards perfect, deniable strikes? An AI that strikes first can secure an early advantage; for example, if it can knock out all running copies of an enemy AI, restoring from backups will take time and leave the enemy AI vulnerable. As another example, if AI Alpha knows it is less capable than AI Bravo, but that AI Bravo will wait to attack it perfectly, AI Alpha attacking first (imperfectly) can force AI Bravo to abandon all previous attack preparations to defend itself (see maneuver warfare).
“Defend itself” might be better put as re-taking and re-securing compromised systems; relatedly, I think cybersecurity defense is much less of an active action than this analysis seems to assume?
An extension of your game theory analysis implies that the US should have nuked the USSR in the 1950s, and should have been nuking all other nuclear nations over the last 70 years. This seems weird? At least, I expect it to not be persuasive to folks thinking about AI society.
The stylistic choice I disagree with most is the bolding: if a short paragraph has 5 different bolded statements, then… what’s the point?