In this post, the existence of a non-gradient based algorithm for computing LSRDRs is a sign that LSRDRs behave mathematically and are quite interpretable. Gradient ascent is a general purpose optimization algorithm that works in the case when there is no other way to solve the optimization problem, but when there are multiple ways of obtaining a solution to an optimization problem, the optimization problem is behaving in a way that should be appealing to mathematicians.
LSRDRs and similar algorithms are pseudodeterministic in the sense that if we train the model multiple times on the same data, we typically get identical models. Pseudodeterminism is a signal of interpretability for several reasons that I will go into more detail in a future post:
Pseudodeterministic models do not contain any extra random or even pseudorandom information that is not contained in the training data already. This means that when interpreting these models, one does not have to interpret random information.
Pseudodeterministic models inherit the symmetry of their training data. For example, if we train a real LSRDR using real symmetric matrices, then the projection will itself by a symmetric matrix.
In mathematics, a well-posed problem is a problem where there exists a unique solution to the problem. Well-posed problems behave better than ill-posed problems in the sense that it is easier to prove results about well-posed problems than it is to prove results about ill-posed problems.
In addition to pseudodeterminism, in my experience, LSRDRs are quite interpretable since I have interpreted LSRDRs already in a few posts:
When performing a dimensionality reduction on tensors, the trace is often zero. — LessWrong
I have Generalized LSRDRs so that they are starting to behave like deeper neural networks. I am trying to expand the capabilities of generalized LSRDRs so they behave more like deep neural networks, but I still have some work to expand their capabilities while retaining pseudodeterminism. In the meantime, generalized LSRDRs may still function as narrow AI for specific problems and also as layers in AI.
Of course, if we want to compare capabilities, we should also compare NNs to LSRDRs at tasks such as evaluating the cryptographic security of block ciphers, solving NP-complete problems in the average case, etc.
As for the difficulty of this post, it seems like that is the result of the post being mathematical. But going through this kind of mathematics so that we obtain inherently interpretable AI should be the easier portion of AI interpretability. I would much rather communicate about the actual mathematics than about how difficult the mathematics is.
I would have thought that a fitness function that is maximized using something other than gradient ascent and which can solve NP-complete problems at least in the average case would be worth reading since that means that it can perform well on some tasks but it also behaves mathematically in a way that is needed for interpretability. The quality of the content is inversely proportional to the number of views since people don’t think the same way as I do.
Wheels on the Bus | @CoComelon Nursery Rhymes & Kids Songs
Stuff that is popular is usually garbage.
But here is my post about the word embedding.
Interpreting a matrix-valued word embedding with a mathematically proven characterization of all optima — LessWrong
And I really do not want to collaborate with people who are not willing to read the post. This is especially true of people in academia since universities promote violence and refuse to acknowledge any wrongdoing. Universities are the absolute worst.