I agree that interpretability research is risky, and that one should carefully consider whether it is worth it to perform this interpretability research. I propose that a safer alternative would be to develop machine learning models that are
quite unrelated to the highest performing machine learning models (this is so that capability gains in these safer models do not translate very well to capability gains for the high performing models),
as inherently interpretable as one can make these models,
more limited in functionality than the highest performing machine learning models, but
still functional enough to be useful in practice.
These characteristics are correlated with each other, so it is likely that one would get all of these characteristics at once. For example, one will most likely not get interpretability for free but instead trade interpretability for performance. Being unrelated to the highest performing machine learning models will also likely reduce the performance of the machine learning models.
It seems like spectral methods such as PageRank or the eigenvectors of the Laplacian of a graph satisfy some of these characteristics, but it would be good if there were more research into developing spectral methods further to increase their functionality.
Perhaps interpretability research has yielded such few results because we need to first develop inherently interpretable models.
I certainly think that developing fundamentally more interpretable models from scratch is a wise path forward for humanity. I think you make some reasonable proposals for directions that could be pursued. There are quite a few researchers and groups working on a wide variety of directions for this sort of fundamentally more interpretable and controllable AI. For example: https://www.lesswrong.com/posts/ngEvKav9w57XrGQnb/cognitive-emulation-a-naive-ai-safety-proposal
The downside is that it’s almost certainly a slower path to power. If you don’t simultaneously slow down all the other, more direct, paths to raw AI power then the slow paths become irrelevant. Like building a very safe campfire in the woods right next to someone building a huge dangerous bonfire. So then you get into the issue of worldwide monitoring and enforcement of AI R&D, which is not an easy problem to tackle. Another way of thinking about this is saying that pursuing safer but less straightforwardly powerful approaches is paying an ‘alignment tax’. https://www.lesswrong.com/tag/alignment-tax
I am very much in favor of this approach by the way. I’m just really concerned about the feasibility and success likelihood of worldwide regulatory enforcement.
I am not expecting any worldwide regulation on AI that prohibits people from using or training unaligned systems (I am just expecting a usual level of regulation). I am mainly hoping for spectral techniques to develop to the point where AI groups will want to use these spectral techniques (or some other method) more and more until they are competitive with neural networks at general tasks or at least complement the deficiencies of neural networks. I also hope that these spectral techniques will remain interpretable and aligned.
Right now, there are several kinds of tasks in which I would rather use spectral techniques than neural networks. I have been evaluating the cryptographic security of block ciphers with small message size and very small key size (for cryptocurrency research), and it seems like the spectral techniques that I have developed give consistent measures of security for such block ciphers (I am not done with the training yet) and these spectral techniques are better for cryptanalysis than neural networks. I have been able to use these spectral techniques for other problems such as the problem of finding the largest clique in a graph (this is not something that I would have expected before I did it), and right now these spectral techniques are the only way I know how to transform a non-commutative polynomial into something that other machine learning models can work with better.
Right now, I do not know how to use spectral techniques to replace deep neural networks. I do not know how to use spectral techniques to approximate a universal function and I do not know how to use spectral techniques to make machine learning models with many layers. I hope to be able to solve these problems of spectral techniques, but I agree that there will be a tradeoff between performance and interpretability. The goal is to make this tradeoff favor interpretable, aligned, and safe machine learning models.
I agree that interpretability research is risky, and that one should carefully consider whether it is worth it to perform this interpretability research. I propose that a safer alternative would be to develop machine learning models that are
quite unrelated to the highest performing machine learning models (this is so that capability gains in these safer models do not translate very well to capability gains for the high performing models),
as inherently interpretable as one can make these models,
more limited in functionality than the highest performing machine learning models, but
still functional enough to be useful in practice.
These characteristics are correlated with each other, so it is likely that one would get all of these characteristics at once. For example, one will most likely not get interpretability for free but instead trade interpretability for performance. Being unrelated to the highest performing machine learning models will also likely reduce the performance of the machine learning models.
It seems like spectral methods such as PageRank or the eigenvectors of the Laplacian of a graph satisfy some of these characteristics, but it would be good if there were more research into developing spectral methods further to increase their functionality.
Perhaps interpretability research has yielded such few results because we need to first develop inherently interpretable models.
I certainly think that developing fundamentally more interpretable models from scratch is a wise path forward for humanity. I think you make some reasonable proposals for directions that could be pursued. There are quite a few researchers and groups working on a wide variety of directions for this sort of fundamentally more interpretable and controllable AI. For example: https://www.lesswrong.com/posts/ngEvKav9w57XrGQnb/cognitive-emulation-a-naive-ai-safety-proposal
The downside is that it’s almost certainly a slower path to power. If you don’t simultaneously slow down all the other, more direct, paths to raw AI power then the slow paths become irrelevant. Like building a very safe campfire in the woods right next to someone building a huge dangerous bonfire. So then you get into the issue of worldwide monitoring and enforcement of AI R&D, which is not an easy problem to tackle. Another way of thinking about this is saying that pursuing safer but less straightforwardly powerful approaches is paying an ‘alignment tax’. https://www.lesswrong.com/tag/alignment-tax
I am very much in favor of this approach by the way. I’m just really concerned about the feasibility and success likelihood of worldwide regulatory enforcement.
I am not expecting any worldwide regulation on AI that prohibits people from using or training unaligned systems (I am just expecting a usual level of regulation). I am mainly hoping for spectral techniques to develop to the point where AI groups will want to use these spectral techniques (or some other method) more and more until they are competitive with neural networks at general tasks or at least complement the deficiencies of neural networks. I also hope that these spectral techniques will remain interpretable and aligned.
Right now, there are several kinds of tasks in which I would rather use spectral techniques than neural networks. I have been evaluating the cryptographic security of block ciphers with small message size and very small key size (for cryptocurrency research), and it seems like the spectral techniques that I have developed give consistent measures of security for such block ciphers (I am not done with the training yet) and these spectral techniques are better for cryptanalysis than neural networks. I have been able to use these spectral techniques for other problems such as the problem of finding the largest clique in a graph (this is not something that I would have expected before I did it), and right now these spectral techniques are the only way I know how to transform a non-commutative polynomial into something that other machine learning models can work with better.
Right now, I do not know how to use spectral techniques to replace deep neural networks. I do not know how to use spectral techniques to approximate a universal function and I do not know how to use spectral techniques to make machine learning models with many layers. I hope to be able to solve these problems of spectral techniques, but I agree that there will be a tradeoff between performance and interpretability. The goal is to make this tradeoff favor interpretable, aligned, and safe machine learning models.