Rohin Shah comments on [AN #142]: The quest to understand a network well enough to reimplement it by hand

Rohin Shah 17 Mar 2021 19:57 UTC
LW: 6 AF: 5
AF
Ah excellent, thanks for the links. I’ll send the Twitter thread in the next newsletter with the following summary:
Last week I speculated that CLIP might “know” that a textual adversarial example is a “picture of an apple with a piece of paper saying an iPod on it” and the zero-shot classification prompt is preventing it from demonstrating this knowledge. Gwern Branwen [commented](https://www.alignmentforum.org/posts/JGByt8TrxREo4twaw/an-142-the-quest-to-understand-a-network-well-enough-to?commentId=keW4DuE7G4SZn9h2r) to link me to this Twitter thread as well as this [YouTube video](https://youtu.be/Rk3MBx20z24) better prompt engineering significantly reduces these textual adversarial examples, demonstrating that CLIP does “know” that it’s looking at an apple with a piece of paper on it.