Based on the quote from Kirkpatrick, It looks like a clear example of preference falsification, but I do not see any reason to believe that it is internalized preference falsification. Did I miss how the submissive apes were internalizing the preference to not mate? The sentence “This is an easy to understand example of an important general fact about humans: we can be threatened into internalized preference falsification, i.e. preference inversion.” makes me think that you intended it as an example of primates internalizing a preference falsification. It feels like the quote is only evidence that primates will be deceptive.
On rereading, the claim that “This is an easy to understand example of an important general fact about humans: we can be threatened into internalized preference falsification, i.e. preference inversion” seems reasonably supported by the next paragraph about male vs female arousal in humans. Maybe I just attached the claim to the wrong evidence.
In addition to the object level reasons mentioned by plex, misleading people about the nature of a benchmark is a problem because it is dishonest. Having an agreement to keep this secret indicates that the deception was more likely intentional on OpenAI’s part.