Cue thriller novel about an Open AI researcher who committed a murder and for some reason the crucial evidence that could get him arrested ended up in the training set of the newest GPT, so even if he scrubbed it from the dataset itself he now lives in fear that at some point, in some conversation, the LLM will just tell someone the truth and he’ll be arrested.
(jokes aside, great work! This actually looks like fantastic news for both AI ethics and safety in general, especially once it is generalised to other kinds of AI beside LLMs, which I imagine should be possible)
especially once it is generalised to other kinds of AI beside LLMs, which I imagine should be possible
The method actually already is highly general, and in fact isn’t specific to deep learning at all. More work does need to be done to see how well it can steer neural net behavior in real world scenarios though
Cue thriller novel about an Open AI researcher who committed a murder and for some reason the crucial evidence that could get him arrested ended up in the training set of the newest GPT, so even if he scrubbed it from the dataset itself he now lives in fear that at some point, in some conversation, the LLM will just tell someone the truth and he’ll be arrested.
(jokes aside, great work! This actually looks like fantastic news for both AI ethics and safety in general, especially once it is generalised to other kinds of AI beside LLMs, which I imagine should be possible)
The method actually already is highly general, and in fact isn’t specific to deep learning at all. More work does need to be done to see how well it can steer neural net behavior in real world scenarios though