All of these operationalizations are about exact notions from the training setup.
Another important notion is revealed identity:
Does the AI think of the other AI as “itself” to the extent that it thinks about stuff like this at all?
Do the AIs cooperate in a way which is reasonably similar to cooperating with yourself?
Current theme: default
Less Wrong (text)
Less Wrong (link)
All of these operationalizations are about exact notions from the training setup.
Another important notion is revealed identity:
Does the AI think of the other AI as “itself” to the extent that it thinks about stuff like this at all?
Do the AIs cooperate in a way which is reasonably similar to cooperating with yourself?