This was basically my answer—I can’t play as an AI using this strategy, for obvious reasons, but an AI that used its one sentence to give a novel and easily-testable solution to a longstanding social problem of some sort (or an easily-testable principle that suggests one or more novel solutions) would probably get at least a second sentence from me (though not a typed response; that seems to open up a risky channel). Especially if the AI in question didn’t actually have access to a lot of information about human culture or me personally and had to infer that a solution like that would be useful from near-base principles—that’s not proof of Friendliness, but an AI using its one guaranteed communication to do something that has a decent chance of improving the world per our definition without any prompting whatsoever sure looks suspiciously like Friendly to me.
This was basically my answer—I can’t play as an AI using this strategy, for obvious reasons, but an AI that used its one sentence to give a novel and easily-testable solution to a longstanding social problem of some sort (or an easily-testable principle that suggests one or more novel solutions) would probably get at least a second sentence from me (though not a typed response; that seems to open up a risky channel). Especially if the AI in question didn’t actually have access to a lot of information about human culture or me personally and had to infer that a solution like that would be useful from near-base principles—that’s not proof of Friendliness, but an AI using its one guaranteed communication to do something that has a decent chance of improving the world per our definition without any prompting whatsoever sure looks suspiciously like Friendly to me.