Joseph Van Name comments on Deep Forgetting & Unlearning for Safely-Scoped LLMs

Joseph Van Name 7 Dec 2023 11:32 UTC
1 point
0
The problem of unlearning would be solved (or kind of solved) if we just used machine learning models that optimize fitness functions that always converged to the same local optimum regardless of the initial conditions (pseudodeterministic training) or at least has very few local optima. But this means that we will have to use something other than neural networks for this and instead use something that behaves much more mathematically. Here the difficulty is to construct pseudodeterministically trained machine learning models that can perform fancy tasks about as efficiently as neural networks. And, hopefully we will not have any issues with a partially retrained pseudodeterministically trained ML model remembering just enough of the bad thing to do bad stuff.