What about the latent adversarial training papers?
What about the Mechanistically Elicitating Latent Behaviours?
the latter is in the list
Alexander is replying to John’s comment (asking him if he thinks these papers are worthwhile); he’s not replying to the top level comment.
What about the latent adversarial training papers?
What about the Mechanistically Elicitating Latent Behaviours?
the latter is in the list
Alexander is replying to John’s comment (asking him if he thinks these papers are worthwhile); he’s not replying to the top level comment.