dr_s comments on LEAst-squares Concept Erasure (LEACE)

dr_s 8 Jun 2023 12:57 UTC
1 point
0

Ever wanted to mindwipe an LLM?

Cue thriller novel about an Open AI researcher who committed a murder and for some reason the crucial evidence that could get him arrested ended up in the training set of the newest GPT, so even if he scrubbed it from the dataset itself he now lives in fear that at some point, in some conversation, the LLM will just tell someone the truth and he’ll be arrested.

(jokes aside, great work! This actually looks like fantastic news for both AI ethics and safety in general, especially once it is generalised to other kinds of AI beside LLMs, which I imagine should be possible)
- Nora Belrose 20 Oct 2023 17:04 UTC
  3 points
  0
  Parent
  especially once it is generalised to other kinds of AI beside LLMs, which I imagine should be possible
  The method actually already is highly general, and in fact isn’t specific to deep learning at all. More work does need to be done to see how well it can steer neural net behavior in real world scenarios though