Nathan Helm-Burger comments on Interpretability Externalities Case Study—Hungry Hungry Hippos

Nathan Helm-Burger 22 Sep 2023 16:03 UTC
3 points
0
I certainly think that developing fundamentally more interpretable models from scratch is a wise path forward for humanity. I think you make some reasonable proposals for directions that could be pursued. There are quite a few researchers and groups working on a wide variety of directions for this sort of fundamentally more interpretable and controllable AI. For example: https://www.lesswrong.com/posts/ngEvKav9w57XrGQnb/cognitive-emulation-a-naive-ai-safety-proposal
The downside is that it’s almost certainly a slower path to power. If you don’t simultaneously slow down all the other, more direct, paths to raw AI power then the slow paths become irrelevant. Like building a very safe campfire in the woods right next to someone building a huge dangerous bonfire. So then you get into the issue of worldwide monitoring and enforcement of AI R&D, which is not an easy problem to tackle. Another way of thinking about this is saying that pursuing safer but less straightforwardly powerful approaches is paying an ‘alignment tax’. https://www.lesswrong.com/tag/alignment-tax
I am very much in favor of this approach by the way. I’m just really concerned about the feasibility and success likelihood of worldwide regulatory enforcement.
- Joseph Van Name 23 Sep 2023 19:21 UTC
  5 points
  0
  Parent
  I am not expecting any worldwide regulation on AI that prohibits people from using or training unaligned systems (I am just expecting a usual level of regulation). I am mainly hoping for spectral techniques to develop to the point where AI groups will want to use these spectral techniques (or some other method) more and more until they are competitive with neural networks at general tasks or at least complement the deficiencies of neural networks. I also hope that these spectral techniques will remain interpretable and aligned.
  Right now, there are several kinds of tasks in which I would rather use spectral techniques than neural networks. I have been evaluating the cryptographic security of block ciphers with small message size and very small key size (for cryptocurrency research), and it seems like the spectral techniques that I have developed give consistent measures of security for such block ciphers (I am not done with the training yet) and these spectral techniques are better for cryptanalysis than neural networks. I have been able to use these spectral techniques for other problems such as the problem of finding the largest clique in a graph (this is not something that I would have expected before I did it), and right now these spectral techniques are the only way I know how to transform a non-commutative polynomial into something that other machine learning models can work with better.
  Right now, I do not know how to use spectral techniques to replace deep neural networks. I do not know how to use spectral techniques to approximate a universal function and I do not know how to use spectral techniques to make machine learning models with many layers. I hope to be able to solve these problems of spectral techniques, but I agree that there will be a tradeoff between performance and interpretability. The goal is to make this tradeoff favor interpretable, aligned, and safe machine learning models.