gordian_gruentuch comments on Why did ChatGPT say that? Prompt engineering and more, with PIZZA.

gordian_gruentuch 5 Aug 2024 10:09 UTC
9 points
8
Could you provide some more insights into the advantages of using hierarchical perturbation for LLM attribution in PIZZA, particularly in terms of computational cost and attribution accuracy?
- Jessica Rumbelow 6 Aug 2024 13:27 UTC
  4 points
  1
  Parent
  Yeah! So, hierarchical perturbation (HiPe) is a bit like a thresholded binary search. It starts by splitting the input into large overlapping chunks and perturbing each of them. If the resulting attributions for any of the chunks are above a certain level, those chunks are split into smaller chunks and the process continues. This works because it efficiently discards input regions that don’t contribute much to the output, without having to individually perturb each token in them.
  
  Standard iterative perturbation (ItP) is much simpler. It just splits the inputs into evenly sized chunks, perturbs each of them in turn to get the attributions, and that’s that. We do this either word-wise or token-wise (word-wise is about 25% quicker).
  So, where n=number of tokens in the prompt and O(1) is the cost of a single completion, ItP is O(n) if we perturb token-wise, or O(0.75n) if word-wise, depending on how many tokens per word your tokeniser gives you on average. This is manageable but not ideal. You could, of course, always perturb iteratively in multi-token chunks, at the cost of attribution granularity.
  HiPe can be harder to predict, as it really depends on the initial chunk size and threshold you use, and the true underlying saliency of the input tokens (which naturally we don’t know). In the worst case with a threshold of zero (a poor choice), an initial chunk size of n and every token being salient, you might end up with O(4n) or more, depending on how you handle overlaps. In practice, with a sensible threshold (we use the mid-range, which works well out of the box) this is rare.
  HiPe really shines on large prompts, where only a few tokens are really important. If a given completion only really relies on 10% of the input tokens, HiPe will give you attributions in a fraction of n.
  I don’t want to make sweeping claims about HiPe’s efficiency in general, as it relies on the actual saliency of the input tokens. Which we don’t know. Which is why we need HiPe! We’d actually love to see someone do a load of benchmark experiments using different configurations to get a better handle on this, if anyone fancies it.