XelaP

Karma: 124

XelaP 10 May 2025 11:30 UTC
3 points
0
on: (Approximately) Deterministic Natural Latents
Minor error is the last part of the last image of the post: “Lambda is determined by Lambda” should be “Lambda is determined by Lambda’ ”

XelaP 10 May 2025 9:14 UTC
6 points
1
on: $500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?
A minor request: can you link “approximate deterministic functions” to some of your posts about them? I know you’ve written on them, and elaborated on them in more detail elsewhere.

A couple questions:
- Can you elaborate more on why you care? Preferably, at a level of elaboration/succinctness that’s above the little you’ve said in the post, and below “read the linked posts and my other posts”. I feel that knowing why you care will help with understanding what the theorem really means. I am probably going to read a fair amount of those posts anyways.
Edit: It is clear now why you care, however I still believe there may be people that would benefit from some elaboration in the post. For anyone wondering, the reason to care is in the first theorem in natural latents: the math. For an easy special case of the ideas, see the extremely easy to understand post on (approximately) deterministic natural latents, and note that by the “Dangly Bits Lemma” the redund condition is implied by the approximately determined condition.

For the intuition behind what the redund condition means, note that by the rerooting rule the diagram X → Y → R is the same (as in, equivalently constrains the probability distribution) X ← Y → R, which means that once you know Y, being told X doesn’t help you predict R. The redund condition is that + the version with X,Y swapped. This makes it clear why this captures “R is redundantly encoded in X and Y”.
- It sounds like you strongly expect the theorem to be true. How hard do you expect it to be? Does it feel like the sort of thing that can be done using basically the tools you’ve used?
- How hard have you tried?
- Do you have examples of the theorem? That is, do you have some variables X with some approximate redund Omega and a bunch of redunds Gamma such that [statements] hold? Or, better, do you have some example where you can find a universal redund Omega?
There are some reasons I can think of for why you’d perhaps want to explicitly refuse to answer some of those questions: to prevent anchoring when a blind mind might do better, to not reveal info that might (if the info turns out one way, assuming truthtelling) discourage attempts (like saying it’s super hard and you tried really hard), and to uphold a general policy of not revealing such info (e.g. perhaps you think this theorem is super easy and you haven’t spent 5 minutes on it, but plan to put bounties on other theorems that are harder).

For example, part of my inclination to try this is my sense that, compared to most conjectures, it’s
- Incredibly low background—I’ve known the requisite probability/information theory for a long time at this point
- A precise, single theorem is asked for. Not as precise as could be, given the freedom of “reasonable”, but compared to stuff like “figure out a theory of how blank works”, it’s concrete and small.
- Very little effort han been put into it. No other conjecture I know of that I know enough to understand has had what looks like a single digit number of people working on it for not very long.
- It intuitively seems like the sort of theorem that some random person on the internet that has the general mathematical ability (“maturity”/”proof ability”—the mostly tacit knowledge) but not a bunch of theoretical knowledge or even isn’t very raw-computing-power smarts (but is probably high creativity/that spark of genius that is at the core of discovery and invention, which can be partially traded off for a lower chance at success).

XelaP 8 May 2025 21:23 UTC
3 points
0
on: Some Rules for an Algebra of Bayes Nets
I think you have an indexing typo in the section on the general re-rooting rule.

Your picture has a “linear diagram” (that’s not your term, it’s mine) rooted at X_i, with variables from X_1 to X_n. You then say that this expresses the factorization P(X) = (\prod_{i \le k} P(X_{i-1} | X_i) P(X_k) \prod_{k \le i \le n-1} P(X_{i+1} | X_i)

This looks to me like the factorization of a linear diagram rooted at X_k, with the index i running over everything to the left and then to the right of X_k.

Secondly, right after that you say that the approximate rerooting theorem states that if the KL divergence of that expression is true for any i, then it is true for all i. This has the same problem—i is a bound variable, so I think you mean that if the bound is true for any k, then it is true for all k.

You should either change the two pictures to have the root variable be labeled X_k instead, or you can change the text in that section (including the mathematical expressions) to swap i with k.

XelaP 20 Apr 2025 6:37 UTC
3 points
0
on: Utility Maximization = Description Length Minimization
There’s a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]

The preceding text is “Of course, we could be wrong about the distribution—we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be”

XelaP 20 Mar 2025 19:07 UTC
1 point
0
in reply to: criticalpoints’s comment on: The Geometry of Linear Regression versus PCA
Certainly, you have pictures! Pictures are great!

XelaP 20 Mar 2025 18:11 UTC
1 point
0
on: Algebraic Linguistics
I had no clue what SUVAT is and I know a relatively large amount of physics (advanced undergrad to grad level knowledge, currently an undergrad in college but with knowledge well above the curriculum). I feel a bit disgusted at the idea of someone memorizing those equations.

The first few letters are often used as parameters (e.g. p(x) = ax^2 + bx + c).

f is sometimes used for force density, e.g, in fluid mechanics (annoyingly, the wiki page on the Cauchy momentum equation uses f for the acceleration density caused by an external force).

Electrical engineers use j for the imaginary unit, because they will use i for current. I abhor this—why don’t they just capitalize and use I for current?

Fancy L’s are often used for the Lagrangian in analytical mechanics. The universe’s path is the one that is a stationary point (derivative equals 0, so minimum/maximum/saddle point) of the integral of the Lagrangian (denoted S), and analytical mechanics only gets more beautiful from there (it’s part of what got me into physics). Only mentioning this because you mentioned the Laplace transform.

m,l are often used for whole numbers when n is taken. So is k.

n with a hat is often used for the normal vector to some surface. Likewise A hat if you include the magnitude of the area.

P,Q,R,S,T are often used for points (like, in geometry). O is used for the origin/the center, though sometimes I see O just being another point.

u is often used for a velocity when v is taken.

Please don’t use s for speed.

s is often an arc length for a path.

R is often another logical proposition.

z is often used for the z-score in statistics, that is, sigmas away from the mean assuming a normal distribution. Likewise t for the t-test, which uses fancy stuff to better estimate the standard deviation from the sample (and only noticeably differs from z-tests for small samples).

k is often used as a multiplicative coefficient, e.g. Hookes law (F=kx).

Mathematics often uses X for a space of some kind. There’s also the convention that an upper case letter is a set from which the lower case letter comes from (e.g. take an example x in the space X of possible examples). When a bunch of sets are considered, one often uses some fancy version of the letter (Suppose we have an example-set A in the family of example-sets curlyA such that every example a in A is funny.)

I could make a long list of physical quantities with letter names. I could double the list by allowing greek letters (including ones used in mathematics But that wouldn’t really be about the connotations of variable names, it would just be a list of things with their variable names.

XelaP 20 Mar 2025 17:43 UTC
1 point
0
on: I Finally Worked Through Bayes’ Theorem (Personal Achievement)
For a fun puzzle, look into the Monty Hall problem. The usual explanation is bad. Use Bayes Law to figure out a good one. For the answer, along with some extra problems (e.g. The Monty Fall problem, where Monty slips on a banana peel and accidentally flips one of the levers, and The Monty Crawl problem, where our poor host now has to crawl, and thus will prefer to open the lowest number door as long as it doesn’t contain the car), see https://probability.ca/jeff/writing/montyfall.pdf

XelaP 20 Mar 2025 17:30 UTC
5 points
1
on: Equations Mean Things
I think you could’ve done better with integration by parts.

In physics, integration by parts is usually applied for a definite integral in which you can neglect the uv term. Thus, integration by parts reads: “The integral of udv = integral of -vdu, that is, you can trade what you differentiate in a product, as long as the functions in question have a small integral over the boundary”.

Common examples are when you integrate over some big volume, as most physical quantities are very small far away from the stuff.

I also think the intuition behind Bayes rule as usually interpreted here on LW, that is, it provides the updating rule posterior odds = prior odds*likelihood ratio and thereby also provides a formalization of how good evidence is. As for the derivation from P(A|B) defined as equal to P(A and B)/P(B), I think this is best described by saying that P(A|B) is the probability of A once you know B, so you take the mass associated to the worlds where A is true once B is true and compare to your total mass, which is the mass associated to the worlds where B is true. The former is really just “mass of A and B”, so you are done.

Now, P(A and B) = P(B)P(A|B), which I think of as “First, take probability B is true, then given that we are in this set of worlds, take the probability that A is true”. Essentially translating from locating sets to probabilities.

From here, Bayes theorem is the simple fact that A and B = B and A. So P(B)P(A|B) = P(A and B) = P(A)P(B|A). If you draw a square with 4 rectangles where the first row is P(A), where the second row is P(-A), where the first column is P(B), and where the second is P(-B), and each rectangle represents a possibility like P(A and -B), then this equation just splits the rectangle P(A and B) into (rectangle compared to row) * row = (rectangle compared to column) * column. Divide by P(B) (that is, the row) to get Bayes law.

For the sine rule, I think it also helps to show that the fraction a/sin(a) is the diameter of the circumcircle. Wikipedia has good pictures.

For an extra math fact that totally doesn’t need to be in the post, it is interesting that for spherical triangles, the law of sines just needs to be modified so that you take the sine of the lengths as well. In fact you can do similar in hyperbolic space (by using sinh), and there’s a taylor series form involving the curvature for a version of sine that makes the law of sines still true in any constant curvature space. (you can find this on the same wiki page).

XelaP 18 Mar 2025 15:22 UTC
2 points
0
on: The Geometry of Linear Regression versus PCA
Great explanation! I was linked here by someone after wondering why linear regression was asymmetric. While a quick google and a chatGPT could tell me that they are minimizing different things, the advantage of your post is the:
1. Pictures
2. Explanation of why minimizing different things will get you slopes differing in this specific way (that is, far outliers are punished heavily)
3. A connection to PCA that is nice and simply explained.
Thanks!

XelaP 11 Jan 2025 10:14 UTC
1 point
0
on: Correct my H5N1 research ($reward)
For a treatment besides Tamiflu: https://en.wikipedia.org/wiki/2009_swine_flu_pandemic cites the who and CDC stating that H1N1 developed resistance to Tamiflu but not Relenza

In December 2012, the World Health Organization (WHO) reported 314 samples of the 2009 pandemic H1N1 flu tested worldwide have shown resistance to oseltamivir (Tamiflu).[172] It is not totally unexpected as 99.6% of the seasonal H1N1 flu strains tested have developed resistance to oseltamivir.[173] No circulating flu has yet shown any resistance to zanamivir (Relenza), the other available anti-viral.[174]

The treatment plan at the time included Tamiflu/Relenza/experimental third thing (FDA approved for flu treatment in adults since 2014)

If oseltamivir (Tamiflu) is unavailable or cannot be used, zanamivir (Relenza) is recommended as a substitute.[50][168] Peramivir is an experimental antiviral drug approved for hospitalised patients in cases where the other available methods of treatment are ineffective or unavailable.[169]

I think 2009 H1N1 is a good example of how things could go, as it happened in the modern day.

XelaP 4 Jan 2025 7:52 UTC
−1 points
2
on: Understanding Shapley Values with Venn Diagrams
I often find illustrative explanations like these either obvious or useless. But this was amazing! Those venn diagrams really are an extremely simple and intuitive and beautiful way to see Shapley values!

Coin Flip

XelaP27 Dec 2024 11:53 UTC

17 points

0 comments1 min readLW link

XelaP 26 Dec 2024 9:15 UTC
1 point
0
on: List of Podcasts
I think it makes sense to include the podcasts that aren’t currently updating—for example, Rationally Speaking’s old episodes. Affix needs a new link or an archived version, as the episodes are not listed at the current link, and I’m too lazy to track down the episodes.

XelaP 21 Dec 2024 19:51 UTC
1 point
0
on: Biological risk from the mirror world
I basically agree. The following is speculation/playing with an idea, not something I think is likely true.

Imagine it’s the future. It becomes clear that a lab could easily create mirror bacteria if they wanted to, or even deliberately create mirror pathogens. It may even get to the point where countries explicitly threaten to do this.

At that point, it might be a good idea to develop mirror life for the purposes of developing countermeasures.

I’m not that familiar with how modern vaccines and drugs are made. Can a vaccine be made without involving a living cell? What about an antibiotic?

XelaP 28 Nov 2024 5:04 UTC
1 point
0
on: What LessWrong/Rationality/EA chat-servers exist that newcomers can join?
There’s The Bayesian Conspiracy’s discord server. No need to listen to the podcast or to related podcasts to participate in discussion.

XelaP 6 Nov 2024 5:39 UTC
1 point
0
in reply to: Jiro’s comment on: Bets and updating
They don’t need to solve the whole Halting Problem, for the same reason you don’t need to contradict Rice’s theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.

XelaP 6 Nov 2024 5:35 UTC
1 point
0
on: Bets and updating
Well, the perplexing situation doesn’t actually happen if the predictors are good enough, because they’ll predict you both won’t update and won’t take the bet. Thus you’ll never have been approached in the first place.

XelaP 26 Aug 2024 23:22 UTC
23 points
0
on: Nuclear war is unlikely to cause human extinction
There’s 148.94 million km^2 of Earth land area, not ~500 million as you claim (which is about the entire surface area of the earth).

This is important because in the kinetic destruction section, you found that your lower bound on human habitation area is 5x larger than the total possible kinetic destruction area. However, your area number is 3.3x too big, since only 30% of the area is land area. Thus your lower bound is only 1.5x larger than the possible destruction area, which makes the bound weaker—it’s pretty plausible that nukes might get 1.5x more destructive or that the destruction would take out enough of humanity to be irrecoverable, and anyways is important if you care about the scale of nonextinction risks.

(I caught this error by knowing that the circumference of the Earth is about 40k km, from which you can quickly estimate the surface area).

XelaP 15 Sep 2023 13:47 UTC
2 points
0
on: Logical Share Splitting
I assume your proposal requires trades be public, so that someone exploiting a proof to get free money ends up revealing the proof to others.

Until computerized theorem proving vastly improves, this system will only prove statements after the first proof is accepted.

XelaP 8 Jul 2023 20:42 UTC
3 points
0
on: Reflections on a year of college
This is a very good collection and distillation of rational college advice. However, there is very little advice from you, about your year, advice that’s the title made me expect.