If I could give some advice: Show that you can do something interesting from an interpretability perspective using your methodology[1], rather than something interesting from a mathematical perspective.
By “something interesting form an interpretability perspective” I mean things like explaining some of the strange goings on in GPT-embedding spaces. Or looking at some weird regularity in AI systems and pointing out that it isn’t obviously explained by other theories but explained by some theory you have/
Presumably there is some intuition, some angle of attack, some perspective that’s driving all of that mathematics. Frankly, I’d rather hear what that intuition is before reading a whole bunch of mathematics I frankly haven’t used in a while.
I already did that. But it seems like the people here simply do not want to get into much mathematics regardless of how closely related to interpretability it is.
P.S. If anyone wants me to apply my techniques to GPT, I would much rather see the embedding spaces as more organized objects. I cannot deal very well with words that are represented as vectors of length 4096 very well. I would rather deal with words that are represented as 64 by 64 matrices (or with some other dimensions). If we want better interpretability, the data needs to be structured in a more organized fashion so that it is easier to apply interpretability tools to the data.
If I could give some advice: Show that you can do something interesting from an interpretability perspective using your methodology[1], rather than something interesting from a mathematical perspective.
By “something interesting form an interpretability perspective” I mean things like explaining some of the strange goings on in GPT-embedding spaces. Or looking at some weird regularity in AI systems and pointing out that it isn’t obviously explained by other theories but explained by some theory you have/
Presumably there is some intuition, some angle of attack, some perspective that’s driving all of that mathematics. Frankly, I’d rather hear what that intuition is before reading a whole bunch of mathematics I frankly haven’t used in a while.
I already did that. But it seems like the people here simply do not want to get into much mathematics regardless of how closely related to interpretability it is.
P.S. If anyone wants me to apply my techniques to GPT, I would much rather see the embedding spaces as more organized objects. I cannot deal very well with words that are represented as vectors of length 4096 very well. I would rather deal with words that are represented as 64 by 64 matrices (or with some other dimensions). If we want better interpretability, the data needs to be structured in a more organized fashion so that it is easier to apply interpretability tools to the data.