Way back in the halcyon days of 2005, a company called Cenqua had an April Fools’ Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I’m wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there’s now a clear market leader for that particular product niche, but for real.
johnswentworth
The word for a drug that causes loss of memory is “amnestic”, not “amnesic”. The word “amnesic” is a variant spelling of “amnesiac”, which is the person who takes the drug.
Thanks! Fixed now.
Oh yeah, I guess that could be a learning effect. When reading it I assumed the lack of need for repeating the numbers was just because the drug was wearing off.
Another class of applications which we discussed at the retreat: person 1 takes the amnestic, person 2 shares private information with them, and then person 1 gives their reaction to the private information. Can be used e.g. for complex negotiations: maybe it is in our mutual best interest to make some deal, but in order for me to know that I’d need some information which you don’t want to share with me, so I take the drug, you share the information, and I record some verified record of myself saying “dear future self, you should in fact take this deal”.
… which is cool in theory but I would guess not of high immediate value in practice, which is why the post didn’t focus on it.
I would love to hear suggestions for other things I could try. If you have any, let me know in a comment!
Some Experiments I’d Like Someone To Try With An Amnestic
Do you know what the drug was which did this?
Nitpick: you’re talking about the discovery of the structure of DNA; it was already known at that time to be the particle which mediates inheritance IIRC.
I buy this argument.
I buy this argument.
I don’t buy mathematical equivalence as an argument against, in this case, since the whole point of the path integral formulation is that it’s mathematically equivalent but far simpler conceptually and computationally.
Man, that top one was a mess. Fixed now, thank you!
Here are some candidates from Claude and Gemini (Claude Opus seemed considerably better than Gemini Pro for this task). Unfortunately they are quite unreliable: I’ve already removed many examples from this list which I already knew to have multiple independent discoverers (like e.g. CRISPR and general relativity). If you’re familiar with the history of any of these enough to say that they clearly were/weren’t very counterfactual, please leave a comment.
Noether’s Theorem
Mendel’s Laws of Inheritance
Godel’s First Incompleteness Theorem (Claude mentions Von Neumann as an independent discoverer for the Second Incompleteness Theorem)
Feynman’s path integral formulation of quantum mechanics
Onnes’ discovery of superconductivity
Pauling’s discovery of the alpha helix structure in proteins
McClintock’s work on transposons
Observation of the cosmic microwave background
Lorentz’s work on deterministic chaos
Prusiner’s discovery of prions
Yamanaka factors for inducing pluripotency
Langmuir’s adsorption isotherm (I have no idea what this is)
[Question] Examples of Highly Counterfactual Discoveries?
I somehow missed that John Wentworth and David Lorell are also in the middle of a sequence on this same topic here.
Yeah, uh… hopefully nobody’s holding their breath waiting for the rest of that sequence. That was the original motivator, but we only wrote the one post and don’t have any more in development yet.
Point is: please do write a good stat mech sequence, David and I are not really “on that ball” at the moment.
(Didn’t read most of the dialogue, sorry if this was covered.)
But the way transformers work is they greedily think about the very next token, and predict that one, even if by conditioning on it you shot yourself in the foot for the task at hand.
That depends on how we sample from the LLM. If, at each “timestep”, we take the most-probable token, then yes that’s right.
But an LLM gives a distribution over tokens at each timestep, i.e. . If we sample from that distribution, rather than take the most-probable at each timestep, then that’s equivalent to sampling non-greedily from the learned distribution over text. It’s the chain rule:
Writing collaboratively is definitely something David and I have been trying to figure out how to do productively.
How sure are we that models will keeptracking Bayesian belief states, and so allow this inverse reasoning to be used, when they don’t have enough space and compute to actually track a distribution over latent states?
One obvious guess there would be that the factorization structure is exploited, e.g. independence and especially conditional independence/DAG structure. And then a big question is how distributions of conditionally independent latents in particular end up embedded.
You are a scholar and a gentleman.