XelaP

Karma: 117

XelaP May 24, 2025, 11:30 PM
5 points
0
in reply to: johnswentworth’s comment on: Examples of Highly Counterfactual Discoveries?
Noether’s theorem is an interesting one. The evidence was there, but it’s the sort discovery that’s incredibly nonobvious even if you have a pile of evidence staring right at you. Perhaps Einstein would’ve gotten it. That she figured it out while working with Hilbert and Einstein on relativity suggests that the ideas that lead to relativity help you think of the ideas of Noether’s Theorem. But I think it’s pretty likely she was quite counterfactual here.

XelaP May 24, 2025, 11:24 PM
3 points
0
in reply to: johnswentworth’s comment on: Examples of Highly Counterfactual Discoveries?
I think they’re talking about a formulation with the same essential point having come up earlier? I’m personally not familiar with Schwinger’s formulation so cannot intelligently comment much. I’ll also note that the true significance of path integrals took a while to realize (at least going by a comment in Shankar’s Princples of Quantum Mechanics, a standard QM textbook, where the preface to the 2nd edition says something like “In the first edition I put a chapter on path integrals because I thought they were important even though most people don’t include them. Boy, they became really important. I’ve added 100 extra pages on path integrals”)

However, I’ll note that Feynmann diagrams are another example of a conceptual advancement that was huge. Though, it seems like the mathematical development of the perturbation series and the fundamental concept was already around. Furthermore Stueckelberg came up with something similar, but didn’t provide as good a way of mechanically translating perturbation expansion terms into diagrams, and didn’t have the path integral (this is additional evidence for counterfactualness of the path integral, if you can apparently get halfway to Feynmann diagrams without coming up with path integrals). Likewise the diagrams took a while to become standard.

Thus it seems likely that Feynmann was pretty counterfactual here. Plausibly others that may have come up with the notation may have dismissed it like the people that dismissed Feynmann.

Feynmann was also famously good at this sort of conceptual insight, and so I am willing to believe that his unique abilities were actually important here.

XelaP May 24, 2025, 11:14 PM
3 points
0
in reply to: johnswentworth’s comment on: Examples of Highly Counterfactual Discoveries?
CMB seems not counterfactual. The discovers did have to notice it and remain confused about how it was unexplained by problems with their equipment, and then be receptive to being told about a paper about how there might be radiation from the early universe. But the discovers were just looking at a sensitive radio detector meant to detect radio waves reflecting off hot air balloons. Anyone that developed sensitive equipment and then try to see faint signals would’ve noticed the mysterious noise.

Given the sheer importance of radio technology, I think there’d be many instances of people developing a similarly sensitive device and noticing the noise. It surprised me to learn that already at the time there was a paper about the possibility of radiation from the early universe, which plausibly sped up discovery. Note also that some astrophysicists nearby were (independently of the first discoverers, not independently of the paper as some of the people wrote the paper) about to look for a signal in the right region with the explicit intent of looking for background radiation.

So, if anything here is counterfactual, it would be Dicke and Peebles predicting the CMB. But I still don’t buy it, because even if nobody predicted it, people would’ve seen it not that long in the future. In fact before the main discovery in 1964, McKellar in 1941 observed a background appearing like a blackbody with the right temperature while observing the spectra of a star. He even guessed it had some significance.

XelaP May 24, 2025, 11:01 PM
5 points
2
in reply to: Jan_Kulveit’s comment on: Examples of Highly Counterfactual Discoveries?
I agree, but, he seems to have rather low counterfactual impact. His discovery was definitely very counterfactual, but it seems like his work was only recognized around the time it would’ve been rediscovered.

XelaP May 24, 2025, 10:59 PM
3 points
0
in reply to: johnswentworth’s comment on: Examples of Highly Counterfactual Discoveries?
Langmuir’s adsorption isotherm is a little bit of statistical mechanics that, given my understanding of what you know already, I think you’d find really easy to understand. Undergrad classes derive it nowadays.

If it’s counterfactual, it would have to be due to spurning some development of statistical mechanics, because after some of the basics were developed someone would’ve derived it. I think it was actually a homework problem! All you have to do is consider a two state system (gas molecule attached to substrate/not attached), then use the grand partition function (the chemical potential, case of the partition function), then substitute a term for the value it has for an ideal gas. You’ll then get something that tells you the fraction of the substrate that will have an attached gas molecule. A neat application is hemoglobin and myoglobin attaching oxygen gas.

For a reference, see Chapter 5 Page 140-143 of Kittel’s “Thermal Physics”, a standard book on undergrad level statistical mechanics.

XelaP May 24, 2025, 10:50 PM
3 points
0
in reply to: johnswentworth’s comment on: Examples of Highly Counterfactual Discoveries?
Onnes discovery seems clearly not counterfactual. My understanding was that multiple people were quite interested in the question of what happens to the resistance when you cool something down using the new tech of Dewars (invented by Dewar) and liquefied helium. For example, Dewar himself was looking into it! Onnes was motivated by an ongoing research agenda with multiple researchers trying to do the thing he was trying. Note also that it was a very short time between when the tech to cool down enough was invented to when Onnes made his discovery.

Onnes’s was the first to liquefy helium, but he bought the device he used (which had the novel innovation of exploiting the Joule Thomson effect to liquefy gases) from the inventors of the device (Linde Machine, using the Hampson-Linde cycle). Onnes performed an earlier resistance measuring experiment, this time with mercury, and then observed the superconductivity. Both of these seem like they would’ve been done pretty soon by someone else.

Surely others would’ve tried cooling a bunch more metals in the already ongoing quest to understand the resistance at cold temperatures, and then realized the superconductivity in some of them. Mercury, lead, and niobium superconduct at low temperatures—surely someone would’ve tried metals as obvious as mercury and lead. At the very least, observation of the superfluidity of liquid helium should’ve spurned people into cooling random stuff and seeing if anything weird happened.

XelaP May 12, 2025, 12:51 AM
1 point
0
in reply to: gwern’s comment on: Statistical Prediction Rules Out-Perform Expert Human Judgments
I meant in terms of the way people use the word “SPR”—of course, if a linear model performs better than experts, than I would expect a linear model for the logit to as well, and if it doesn’t, that doesn’t change the point of the argument because you can just use the linear model.

XelaP May 11, 2025, 11:14 PM
−1 points
0
on: Statistical Prediction Rules Out-Perform Expert Human Judgments
It seems like you could do better with a logit model

p = logistic( \sum_i w_i c_i ) that is, logit(p) = log odds(p) = \sum_i w_i c_i

Are these also called SPR’s?

XelaP May 11, 2025, 10:34 PM
1 point
on: [unknown]
I think you could make the first theorem in the post (simplified fundamental theorem on two variables) easier to understand to a novice if you explicitly clarified that the conclusion diagram Λ′ → Λ → X is the same as Λ′ ← Λ → X by the chain rerooting rule, and perhaps use the latter diagram in the picture as it more directly makes clear the idea of mediation/inducing independence.

I also think this about the redund condition X_1 → X_2-> Λ ′ & X_2 → X_1 → Λ′. Until realizing that these diagrams were the same as X_1 ← X_2 → Λ′, the condition seemed mysterious to me, and because you didn’t describe them using the same english words, it took me a while to realize that it makes sense if I think of it as X_2 mediating between X_1 and Λ′ (so learning about X_1 doesn’t tell me anything new about Λ′) and vice versa.

XelaP May 11, 2025, 10:27 PM
3 points
0
on: Natural Latents: The Math
I think you could make the first theorem in the post (simplified fundamental theorem on two variables) easier to understand to a novice if you explicitly clarified that the conclusion diagram Λ′ → Λ → X is the same as Λ′ ← Λ → X by the chain rerooting rule, and perhaps use the latter diagram in the picture as it more directly makes clear the idea of mediation/inducing independence.

XelaP May 11, 2025, 10:16 PM
4 points
−2
on: a confusion about preference orderings
I, too, have had the same objection you have with people that claim that the problem with intransitive preferences is that you can be money pumped, and that our real objection is just that it’d be really weird to be able to transition by stepwise preferred states and yet end up in a state dispreferred to the start (this not being a money pump because the agent can just choose not to do this).

Though, “it’s really weird” is a pretty good objection—it, in fact, would be extremely weird to have intransitive preferences, and so I think it is fine to assume that the “true” (Coherent Extrapolated Volition) preferences of e.g. humans are transitive.

You can use the intuition that a greedy optimizer shouldn’t ever end up worse than it started, even if it isn’t in the best place.

XelaP May 10, 2025, 9:01 PM
1 point
0
on: Statistical Prediction Rules Out-Perform Expert Human Judgments
It seems like you could do better with a logit model

p = logistic( \sum_i w_i c_i ) that is, logit(p) = log odds(p) = \sum_i w_i c_i

Are these also called SPR’s?

XelaP May 10, 2025, 7:28 PM
1 point
0
in reply to: gerg’s comment on: Test Your Calibration!

Does “chance relative to null is x%” mean “An observer, given my results, would assign an x% to me being calibrated”

No! P(Test results | Perfect calibration) / P(Test results | Whatever the null is) ≠ P(Perfect Calibration | Test results) !

You can also lodge this is a problem with null hypothesis testing—I would’ve thought that perfect calibration would be the null. Perhaps the null is a model where you just randomly say a probability from 0 to 100.

I’m assuming that they really calculated a likelihood function P(Data|Perfect) / P(Data|Null) instead of the posteriorP(Perfect|Data) / P(Null|Data) as the words they used would mean if taken literally. But maybe they have some priors P(Perfect) / P(Null) that they used. (The thing they should do is just report the likelihood ratio, instead of their posterior).

If you have your data and want to compute P(Data|Perfect), you can compute a total product Π_i (p_i if it happened, (1-p_i) if it didn’t)

So for example if I predicted 20%, 70%, 30% and the actual results were No, Yes, Yes, then P(Data|Perfect) = .8 * .7 * .3. If you have some other hypothesis (e.g. whatever their null is), you can compute P(Data|Other Hypothesis) by using the predictions that hypothesis makes for how your reported probabilities relate to propensities of events. A hypothesis here should be a function f(reported) = P(Event happens | reported).

XelaP May 10, 2025, 11:30 AM
3 points
0
on: (Approximately) Deterministic Natural Latents
Minor error is the last part of the last image of the post: “Lambda is determined by Lambda” should be “Lambda is determined by Lambda’ ”

XelaP May 10, 2025, 9:14 AM
6 points
1
on: $500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?
A minor request: can you link “approximate deterministic functions” to some of your posts about them? I know you’ve written on them, and elaborated on them in more detail elsewhere.

A couple questions:
- Can you elaborate more on why you care? Preferably, at a level of elaboration/succinctness that’s above the little you’ve said in the post, and below “read the linked posts and my other posts”. I feel that knowing why you care will help with understanding what the theorem really means. I am probably going to read a fair amount of those posts anyways.
Edit: It is clear now why you care, however I still believe there may be people that would benefit from some elaboration in the post. For anyone wondering, the reason to care is in the first theorem in natural latents: the math. For an easy special case of the ideas, see the extremely easy to understand post on (approximately) deterministic natural latents, and note that by the “Dangly Bits Lemma” the redund condition is implied by the approximately determined condition.

For the intuition behind what the redund condition means, note that by the rerooting rule the diagram X → Y → R is the same (as in, equivalently constrains the probability distribution) X ← Y → R, which means that once you know Y, being told X doesn’t help you predict R. The redund condition is that + the version with X,Y swapped. This makes it clear why this captures “R is redundantly encoded in X and Y”.
- It sounds like you strongly expect the theorem to be true. How hard do you expect it to be? Does it feel like the sort of thing that can be done using basically the tools you’ve used?
- How hard have you tried?
- Do you have examples of the theorem? That is, do you have some variables X with some approximate redund Omega and a bunch of redunds Gamma such that [statements] hold? Or, better, do you have some example where you can find a universal redund Omega?
There are some reasons I can think of for why you’d perhaps want to explicitly refuse to answer some of those questions: to prevent anchoring when a blind mind might do better, to not reveal info that might (if the info turns out one way, assuming truthtelling) discourage attempts (like saying it’s super hard and you tried really hard), and to uphold a general policy of not revealing such info (e.g. perhaps you think this theorem is super easy and you haven’t spent 5 minutes on it, but plan to put bounties on other theorems that are harder).

For example, part of my inclination to try this is my sense that, compared to most conjectures, it’s
- Incredibly low background—I’ve known the requisite probability/information theory for a long time at this point
- A precise, single theorem is asked for. Not as precise as could be, given the freedom of “reasonable”, but compared to stuff like “figure out a theory of how blank works”, it’s concrete and small.
- Very little effort han been put into it. No other conjecture I know of that I know enough to understand has had what looks like a single digit number of people working on it for not very long.
- It intuitively seems like the sort of theorem that some random person on the internet that has the general mathematical ability (“maturity”/”proof ability”—the mostly tacit knowledge) but not a bunch of theoretical knowledge or even isn’t very raw-computing-power smarts (but is probably high creativity/that spark of genius that is at the core of discovery and invention, which can be partially traded off for a lower chance at success).

XelaP May 8, 2025, 9:23 PM
3 points
0
on: Some Rules for an Algebra of Bayes Nets
I think you have an indexing typo in the section on the general re-rooting rule.

Your picture has a “linear diagram” (that’s not your term, it’s mine) rooted at X_i, with variables from X_1 to X_n. You then say that this expresses the factorization P(X) = (\prod_{i \le k} P(X_{i-1} | X_i) P(X_k) \prod_{k \le i \le n-1} P(X_{i+1} | X_i)

This looks to me like the factorization of a linear diagram rooted at X_k, with the index i running over everything to the left and then to the right of X_k.

Secondly, right after that you say that the approximate rerooting theorem states that if the KL divergence of that expression is true for any i, then it is true for all i. This has the same problem—i is a bound variable, so I think you mean that if the bound is true for any k, then it is true for all k.

You should either change the two pictures to have the root variable be labeled X_k instead, or you can change the text in that section (including the mathematical expressions) to swap i with k.

XelaP Apr 20, 2025, 6:37 AM
3 points
0
on: Utility Maximization = Description Length Minimization
There’s a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]

The preceding text is “Of course, we could be wrong about the distribution—we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be”

XelaP Mar 20, 2025, 7:07 PM
1 point
0
in reply to: criticalpoints’s comment on: The Geometry of Linear Regression versus PCA
Certainly, you have pictures! Pictures are great!

XelaP Mar 20, 2025, 6:11 PM
1 point
0
on: Algebraic Linguistics
I had no clue what SUVAT is and I know a relatively large amount of physics (advanced undergrad to grad level knowledge, currently an undergrad in college but with knowledge well above the curriculum). I feel a bit disgusted at the idea of someone memorizing those equations.

The first few letters are often used as parameters (e.g. p(x) = ax^2 + bx + c).

f is sometimes used for force density, e.g, in fluid mechanics (annoyingly, the wiki page on the Cauchy momentum equation uses f for the acceleration density caused by an external force).

Electrical engineers use j for the imaginary unit, because they will use i for current. I abhor this—why don’t they just capitalize and use I for current?

Fancy L’s are often used for the Lagrangian in analytical mechanics. The universe’s path is the one that is a stationary point (derivative equals 0, so minimum/maximum/saddle point) of the integral of the Lagrangian (denoted S), and analytical mechanics only gets more beautiful from there (it’s part of what got me into physics). Only mentioning this because you mentioned the Laplace transform.

m,l are often used for whole numbers when n is taken. So is k.

n with a hat is often used for the normal vector to some surface. Likewise A hat if you include the magnitude of the area.

P,Q,R,S,T are often used for points (like, in geometry). O is used for the origin/the center, though sometimes I see O just being another point.

u is often used for a velocity when v is taken.

Please don’t use s for speed.

s is often an arc length for a path.

R is often another logical proposition.

z is often used for the z-score in statistics, that is, sigmas away from the mean assuming a normal distribution. Likewise t for the t-test, which uses fancy stuff to better estimate the standard deviation from the sample (and only noticeably differs from z-tests for small samples).

k is often used as a multiplicative coefficient, e.g. Hookes law (F=kx).

Mathematics often uses X for a space of some kind. There’s also the convention that an upper case letter is a set from which the lower case letter comes from (e.g. take an example x in the space X of possible examples). When a bunch of sets are considered, one often uses some fancy version of the letter (Suppose we have an example-set A in the family of example-sets curlyA such that every example a in A is funny.)

I could make a long list of physical quantities with letter names. I could double the list by allowing greek letters (including ones used in mathematics But that wouldn’t really be about the connotations of variable names, it would just be a list of things with their variable names.

XelaP Mar 20, 2025, 5:43 PM
1 point
0
on: I Finally Worked Through Bayes’ Theorem (Personal Achievement)
For a fun puzzle, look into the Monty Hall problem. The usual explanation is bad. Use Bayes Law to figure out a good one. For the answer, along with some extra problems (e.g. The Monty Fall problem, where Monty slips on a banana peel and accidentally flips one of the levers, and The Monty Crawl problem, where our poor host now has to crawl, and thus will prefer to open the lowest number door as long as it doesn’t contain the car), see https://probability.ca/jeff/writing/montyfall.pdf