johnswentworth

Karma: 44,495

johnswentworth 4 May 2024 5:34 UTC
4 points
0
on: My hour of memoryless lucidity
Do you know what the drug was which did this?

johnswentworth 24 Apr 2024 15:38 UTC
3 points
1
in reply to: Alexander Gietelink Oldenziel’s comment on: Examples of Highly Counterfactual Discoveries?
Nitpick: you’re talking about the discovery of the structure of DNA; it was already known at that time to be the particle which mediates inheritance IIRC.

johnswentworth 24 Apr 2024 15:30 UTC
1 point
0
in reply to: Jan_Kulveit’s comment on: Examples of Highly Counterfactual Discoveries?
I buy this argument.

johnswentworth 24 Apr 2024 15:29 UTC
2 points
0
in reply to: Ben’s comment on: Examples of Highly Counterfactual Discoveries?
I buy this argument.

johnswentworth 24 Apr 2024 15:28 UTC
8 points
2
in reply to: Alexander Gietelink Oldenziel’s comment on: Examples of Highly Counterfactual Discoveries?
I don’t buy mathematical equivalence as an argument against, in this case, since the whole point of the path integral formulation is that it’s mathematically equivalent but far simpler conceptually and computationally.

johnswentworth 23 Apr 2024 22:59 UTC
2 points
0
in reply to: Mateusz Bagiński’s comment on: Some Rules for an Algebra of Bayes Nets
Man, that top one was a mess. Fixed now, thank you!

johnswentworth 23 Apr 2024 22:30 UTC
9 points
1
on: Examples of Highly Counterfactual Discoveries?
Here are some candidates from Claude and Gemini (Claude Opus seemed considerably better than Gemini Pro for this task). Unfortunately they are quite unreliable: I’ve already removed many examples from this list which I already knew to have multiple independent discoverers (like e.g. CRISPR and general relativity). If you’re familiar with the history of any of these enough to say that they clearly were/weren’t very counterfactual, please leave a comment.
- Noether’s Theorem
- Mendel’s Laws of Inheritance
- Godel’s First Incompleteness Theorem (Claude mentions Von Neumann as an independent discoverer for the Second Incompleteness Theorem)
- Feynman’s path integral formulation of quantum mechanics
- Onnes’ discovery of superconductivity
- Pauling’s discovery of the alpha helix structure in proteins
- McClintock’s work on transposons
- Observation of the cosmic microwave background
- Lorentz’s work on deterministic chaos
- Prusiner’s discovery of prions
- Yamanaka factors for inducing pluripotency
- Langmuir’s adsorption isotherm (I have no idea what this is)

[Question] Examples of Highly Counterfactual Discoveries?

johnswentworth23 Apr 2024 22:19 UTC

176 points

90 comments1 min readLW link

johnswentworth 22 Apr 2024 16:37 UTC
9 points
1
on: Forget Everything (Statistical Mechanics Part 1)
I somehow missed that John Wentworth and David Lorell are also in the middle of a sequence on this same topic here.
Yeah, uh… hopefully nobody’s holding their breath waiting for the rest of that sequence. That was the original motivator, but we only wrote the one post and don’t have any more in development yet.
Point is: please do write a good stat mech sequence, David and I are not really “on that ball” at the moment.

johnswentworth 22 Apr 2024 5:45 UTC
4 points
0
on: Goal oriented cognition in “a single forward pass”
(Didn’t read most of the dialogue, sorry if this was covered.)
But the way transformers work is they greedily think about the very next token, and predict that one, even if by conditioning on it you shot yourself in the foot for the task at hand.
That depends on how we sample from the LLM. If, at each “timestep”, we take the most-probable token, then yes that’s right.
But an LLM gives a distribution over tokens at each timestep, i.e. $P [{token}_{t} | {token}_{1}, . . ., {token}_{t - 1}]$ . If we sample from that distribution, rather than take the most-probable at each timestep, then that’s equivalent to sampling non-greedily from the learned distribution over text. It’s the chain rule:
$P [{token}_{1}, . . ., {token}_{t}] = P [{token}_{1}] * P [{token}_{2} | {token}_{1}] * . . . * P [{token}_{t} | {token}_{1}, . . ., {token}_{t - 1}]$

johnswentworth 20 Apr 2024 3:34 UTC
6 points
0
in reply to: Ben Pace’s comment on: LessOnline Updates Thread
Writing collaboratively is definitely something David and I have been trying to figure out how to do productively.

johnswentworth 18 Apr 2024 15:53 UTC
6 points
2
in reply to: Lucius Bushnaq’s comment on: Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
How sure are we that models will keeptracking Bayesian belief states, and so allow this inverse reasoning to be used, when they don’t have enough space and compute to actually track a distribution over latent states?
One obvious guess there would be that the factorization structure is exploited, e.g. independence and especially conditional independence/DAG structure. And then a big question is how distributions of conditionally independent latents in particular end up embedded.

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth and David Lorell

18 Apr 2024 0:27 UTC

163 points

18 comments7 min readLW link

johnswentworth 17 Apr 2024 22:25 UTC
3 points
0
in reply to: Adam Shai’s comment on: Transformers Represent Belief State Geometry in their Residual Stream
Yup, that was it, thankyou!

johnswentworth 17 Apr 2024 21:04 UTC
14 points
0
on: Transformers Represent Belief State Geometry in their Residual Stream
We’re now working through understanding all the pieces of this, and we’ve calculated an MSP which doesn’t quite look like the one in the post:
(Ignore the skew, David’s still fiddling with the projection into 2D. The important noticeable part is the absence of “overlap” between the three copies of the main shape, compared to the fractal from the post.)
Specifically, each point in that visual corresponds to a distribution $(P [H^{t} = H_{0} | O^{< t}], P [H^{t} = H_{1} | O^{< t}], P [H^{t} = H_{2} | O^{< t}])$ for some value of the observed symbols $O$ . The image itself is of the points on the probability simplex. From looking at a couple of Crutchfield papers, it sounds like that’s what the MSP is supposed to be.
The update equations are:
- $P [H^{t + 1} | O^{\leq t}] = \sum_{H^{t}} P [H^{t + 1} | H^{t}] P [H^{t} | O^{\leq t}]$
- $P [H^{t} | O^{\leq t}] = \frac{1}{Z} P [O^{t} | H^{t}] P [H^{t} | O^{< t}]$
with $P [H^{t + 1} | H^{t}]$ given by the transition probabilities, $P [O^{t} | H^{t}]$ given by the observation probabilities, and $Z$ a normalizer. We generate the image above by running initializing some random distribution $P [H^{0}]$ , then iterating the equations and plotting each point.
Off the top of your head, any idea what might account for the mismatch (other than a bug in our code, which we’re already checking)? Are we calculating the right thing, i.e. values of $(P [H^{t} = H_{0} | O^{< t}], P [H^{t} = H_{1} | O^{< t}], P [H^{t} = H_{2} | O^{< t}])$ ? Are the transition and observation probabilities from the graphic in the post the same parameters used to generate the fractal? Is there some thing which people always forget to account for when calculating these things?

johnswentworth 17 Apr 2024 2:18 UTC
6 points
0
in reply to: Adam Shai’s comment on: Transformers Represent Belief State Geometry in their Residual Stream
Can you elaborate on how the fractal is an artifact of how the data is visualized?
I don’t know the details of the MSP, but my current understanding is that it’s a general way of representing stochastic processes, and the MSP representation typically looks quite fractal. If we take two approximately-the-same stochastic processes, then they’ll produce visually-similar fractals.
But the “fractal-ness” is mostly an artifact of the MSP as a representation-method IIUC; the stochastic process itself is not especially “naturally fractal”.
(As I said I don’t know the details of the MSP very well; my intuition here is instead coming from some background knowledge of where fractals which look like those often come from, specifically chaos games.)
That there is a linear 2d plane in the residual stream that when you project onto it you get that same fractal seems highly non-artifactual, and is what we were testing.
A thing which is highly cruxy for me here, which I did not fully understand from the post: what exactly is the function which produces the fractal visual from the residual activations? My best guess from reading the post was that the activations are linearly regressed onto some kind of distribution, and then the distributions are represented in a particular way which makes smooth sets of distributions look fractal. If there’s literally a linear projection of the residual stream into two dimensions which directly produces that fractal, with no further processing/transformation in between “linear projection” and “fractal”, then I would change my mind about the fractal structure being mostly an artifact of the visualization method.

johnswentworth 17 Apr 2024 1:35 UTC
15 points
−1
on: Transformers Represent Belief State Geometry in their Residual Stream
[EDIT: I no longer endorse this response, see thread.]
(This comment is mainly for people other than the authors.)
If your reaction to this post is “hot damn, look at that graph”, then I think you should probably dial back your excitement somewhat. IIUC the fractal structure is largely an artifact of how the data is visualized, which means the results visually look more striking than they really are.
It is still a cool piece of work, and the visuals are beautiful. The correct amount of excitement is greater than zero.

johnswentworth 13 Apr 2024 17:27 UTC
2 points
0
in reply to: Oskar Mathiasen’s comment on: Generalized Stat Mech: The Boltzmann Approach
Yup. Also, I’d add that entropy in this formulation increases exactly when more than one macrostate at time $t$ maps to the same actually-realized macrostate at time $t + 1$ , i.e. when the macrostate evolution is not time-reversible.

johnswentworth 13 Apr 2024 0:00 UTC
4 points
0
in reply to: Steven Byrnes’s comment on: Generalized Stat Mech: The Boltzmann Approach
This post was very specifically about a Boltzmann-style approach. I’d also generally consider the Gibbs/Shannon formula to be the “real” definition of entropy, and usually think of Boltzmann as the special case where the microstate distribution is constrained uniform. But a big point of this post was to be like “look, we can get surprisingly a lot (though not all) of thermo/stat mech even without actually bringing in any actual statistics, just restricting ourselves to the Boltzmann notion of entropy”.

Generalized Stat Mech: The Boltzmann Approach

David Lorell and johnswentworth

12 Apr 2024 17:47 UTC

67 points

7 comments20 min readLW link

johnswentworth

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

Gen­er­al­ized Stat Mech: The Boltz­mann Approach

[Question] Examples of Highly Counterfactual Discoveries?

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Generalized Stat Mech: The Boltzmann Approach