To provide some citations :) There are a few nice papers looking at why Lissajous curves appear when you do PCA in high dimensions:
J. Antognini and J. Sohl-Dickstein. “PCA of high dimensional random walks with comparison to neural network training”. In Advances in Neural Information Processing Systems, volume 31, 2018
M. Shinn. “Phantom oscillations in principal component analysis”. Proceedings of the National Academy of Sciences, 120(48):e2311420120, 2023.
It is indeed the case that the published literature has quite a few people making fools of themselves by not understanding this. On the flipside, just because you see something Lissajous-like in the PCA, doesn’t necessarily mean that the extrema are not meaningful. One can show that if a process has stagewise development, there is a sense in which performing PCA will tend to adapt PC1 to be a “developmental clock” such that the extrema of PC2 as a function of PC1 tends to line up with the midpoint of development (even if this is quite different from the middle “wall time”). We’ve noticed this in a few different systems.
So one has to be careful in both directions with Lissajous curves in PCA (not to read tea leaves, and also not to throw out babies with bathwater, etc).
Thanks! Are you saying there is a better way to find citations than a random walk through the literature? :)
I didn’t realize that the pictures above limit to literal pieces of sin and cos curves (and Lissajous curves more generally). I suspect this is a statement about the singular values of the “sum” matrix S of upper-triangular 1′s?
The “developmental clock” observation is neat! Never heard of it before. Is it a qualitative “parametrization of progress” thing or are there phase transition phenomena that happen specifically around the midpoint?
Hehe. Yes that’s right, in the limit you can just analyse the singular values and vectors by hand, it’s nice.
No general implied connection to phase transitions, but the conjecture is that if there are phase transitions in your development then you can for general reasons expect PCA to “attempt” to use the implicit “coordinates” provided by the Lissajous curves (i.e. a binary tree, the first Lissajous curve uses PC2 to split the PC1 range into half, and so on) to locate stages within the overall development. I got some way towards proving that by extending the literature I cited in the parent, but had to move on, so take the story with a grain of salt. This seems to make sense empirically in some cases (e.g. our paper).
One of the talks at ILIAD had a set for PCA plots where the PC2 turned around at different points for different training setups. I think the turning point corresponded to when the model started to overfit. I don’t quite remember. But what ever the meaning of the turning point was, I think they also verified this with some other observation. Given that this was ILIAD the other observation was probably the LLC.
If you want to look it up I can try to find the talk among the recordings.
To provide some citations :) There are a few nice papers looking at why Lissajous curves appear when you do PCA in high dimensions:
J. Antognini and J. Sohl-Dickstein. “PCA of high dimensional random walks with comparison to neural network training”. In Advances in Neural Information Processing Systems, volume 31, 2018
M. Shinn. “Phantom oscillations in principal component analysis”. Proceedings of the National Academy of Sciences, 120(48):e2311420120, 2023.
It is indeed the case that the published literature has quite a few people making fools of themselves by not understanding this. On the flipside, just because you see something Lissajous-like in the PCA, doesn’t necessarily mean that the extrema are not meaningful. One can show that if a process has stagewise development, there is a sense in which performing PCA will tend to adapt PC1 to be a “developmental clock” such that the extrema of PC2 as a function of PC1 tends to line up with the midpoint of development (even if this is quite different from the middle “wall time”). We’ve noticed this in a few different systems.
So one has to be careful in both directions with Lissajous curves in PCA (not to read tea leaves, and also not to throw out babies with bathwater, etc).
Thanks! Are you saying there is a better way to find citations than a random walk through the literature? :)
I didn’t realize that the pictures above limit to literal pieces of sin and cos curves (and Lissajous curves more generally). I suspect this is a statement about the singular values of the “sum” matrix S of upper-triangular 1′s?
The “developmental clock” observation is neat! Never heard of it before. Is it a qualitative “parametrization of progress” thing or are there phase transition phenomena that happen specifically around the midpoint?
Hehe. Yes that’s right, in the limit you can just analyse the singular values and vectors by hand, it’s nice.
No general implied connection to phase transitions, but the conjecture is that if there are phase transitions in your development then you can for general reasons expect PCA to “attempt” to use the implicit “coordinates” provided by the Lissajous curves (i.e. a binary tree, the first Lissajous curve uses PC2 to split the PC1 range into half, and so on) to locate stages within the overall development. I got some way towards proving that by extending the literature I cited in the parent, but had to move on, so take the story with a grain of salt. This seems to make sense empirically in some cases (e.g. our paper).
One of the talks at ILIAD had a set for PCA plots where the PC2 turned around at different points for different training setups. I think the turning point corresponded to when the model started to overfit. I don’t quite remember. But what ever the meaning of the turning point was, I think they also verified this with some other observation. Given that this was ILIAD the other observation was probably the LLC.
If you want to look it up I can try to find the talk among the recordings.
The paper you’re thinking of is probably The Developmental Landscape of In-Context Learning.
It looks related, but these are not the plots I remember from the talk.