James Camacho

Karma: 101

James Camacho Feb 17, 2025, 10:19 PM
3 points
0
on: Celtic Knots on Einstein Lattice
The sidelengths for the Einstein tile are all either $1$ or $\sqrt{3}$ , except for a single side of length $2$ . I think it makes more sense to treat that side as two sides, with a $180^{\circ}$ angle between them. Then you would get fourteen entry/exit points:
The aperiodic tiling from the paper cannot be put onto a hexagonal grid, and some of the tiles are flipped vertically, so you need every edge to have an entry/exit to make a Celtic knot out of it. Also, I would recommend using $T i l e (1, 1)$ rather than $T i l e (1, \sqrt{3})$ so the arcs turn out pretty:

James Camacho Feb 1, 2025, 4:08 AM
1 point
0
on: What’s the Right Way to think about Information Theoretic quantities in Neural Networks?
I’m not entirely sure what you’ve looked at in the literature; have you seen “Direct Validation of the Information Bottleneck Principle for Deep Nets” (Elad et al.)? They use the Fenchel conjugate
\[\mathrm{KL}(P||Q) = \sup_{f} [\mathbb{E}_P[f]-\log (\mathbb{E}_Q[e^f])]\]
This turns finding the KL-divergence into an optimisation problem for $f^*(x) = \log \frac{p(x)}{q(x)}$. Since
\[I(X;Y)=\mathrm{KL}(P_{X,Y}||P_{X\otimes Y}),\]
you can train a neural network to predict the mutual information. For the information bottleneck, you would train two additional networks. Ideal lossy compression maximises the information bottleneck, so this can be used as regularization for autoencoders. I did this for a twenty-bit autoencoder of MNIST (code). Here are the encodings without mutual information regularization, and then with it:
Notice how the digits are more “spread out” with regularization. This is exactly the benefit variational autoencoders give, but actually based in theory! In this case, my randomness comes from the softmax choice for each bit, but you could also use continuous latents like Gaussians. In fact, you could even have a deterministic encoder, since the mutual information predictor network is adversarial, not omniscient. The encoder can fool it during training by having small updates in its weights make large updates in the latent space, which means (1) it’s essentially random in the same way lava lamps are random, (2) the decoder will learn to ignore noise, and (3) the initial distribution of encoder weights will concentrate around “spines” in the loss landscape, which have lots of symmetries in all dimensions except the important features you’re encoding.
The mutual information is cooperative for the decoder network, which means the decoder should be deterministic. Since mutual information is convex, i.e. if we had two decoders $\phi_1, \phi_2: Z\to Y$, then
\[\lambda I(Z; \phi_1(Z)) + (1-\lambda) I(Z; \phi_2(Z)) \ge I(Z; (\lambda\phi_1 + (1-\lambda)\phi_2)(Z)).\]
Every stochastic decoder could be written as a sum of deterministic ones, so you might as well use the best deterministic decoder. Then
\[I(Z; \phi(Z)) = H(\phi(Z)) \cancel{-H(\phi(Z)|Z)}\]
so you’re really just adding in the entropy of the decoded distribution. The paper “The Deterministic Information Bottleneck” (Strauss & Schwab) argues that the encoder $\psi: X\to Z$ should also be deterministic, since maximising
\[-\beta I(X; \psi(X)) = \beta[H(\psi(X)|X) - H(\psi(X))]\]
seems to reward adding noise irrelevant to $X$. But that misses the point; you want to be overwriting unimportant features of the image with noise. It’s more clear this is happening if you use the identity
\[-\beta I(X;\psi(X)) = \beta H(X|\psi(X))\cancel{ - \beta H(X)}\]
(the second term is constant). This is the same idea behind variational autoencoders, but they use $\mathrm{KL}(\psi(X)||\mathcal{N}(0, 1))$ as a cheap proxy to $H(X|\psi(X))$.

James Camacho Jan 20, 2025, 6:42 PM
4 points
0
in reply to: Dalcy’s comment on: What’s the Right Way to think about Information Theoretic quantities in Neural Networks?
Reversible networks (even when trained) for example have the same partition induced even if you keep stacking more layers, so from the perspective of information theory, everything looks the same
I don’t think this is true? The differential entropy changes, even if you use a reversible map:
$H (Y) = H (X) + E_{X} [log | det J |]$
where $J$ is the Jacobian of your map. Features that are “squeezed together” are less usable, and you end up with a smaller entropy. Similarly, “unsqueezing” certain features, or examining them more closely, gives a higher entropy.

James Camacho Jan 19, 2025, 2:50 AM
1 point
0
on: The quantum red pill or: They lied to you, we live in the (density) matrix
A couple things to add:
1. Since every invertible square matrix can be decomposed as $A^{t} = e^{(Z + i H) t}$ , you don’t actually need a unitary assumption. You can just say that after billions of years, all but the largest Z-matrices have died out.
2. There’s another tie between statistics and quantum evolution called the Wick rotation. If you set $t = i β$ , then $E [e^{(Z + i H) t}] = E [e^{- H β}]$ so the inverse-temperature is literally imaginary time! You can recover the Boltzmann distribution by looking at the expected number of particles in each state: $E [⟨ n | e^{(Z + i H) t} | n ⟩] = e^{- β E_{n}}$ where $E_{n}$ is the $n$ th eigenvalue (energy in the $n$ th state).

James Camacho Jan 18, 2025, 1:35 AM
−1 points
−2
on: James Camacho’s Shortform
Why are conservatives for punitive correction while progressives do not think it works? I think this can be explained by the difference between stable equilibria and saddle points.
If you have a system where people make random “mistakes” an $ϵ$ amount of the time, the stable points are known as trembling-hand equilibria. Or, similarly, if they transition to different policies some H of the time, you get some thermodynamic distribution. In both models, your system is exponentially more likely to end up in states it is hard to transition out of (Ellison’s lemma & the Boltzmann distribution respectively). Societies will usually spend a lot of time at a stable equilibrium, and then rapidly transition to a new one when the temperature increaes, in a way akin to simulated annealing. Note that we’re currently in one of those transition periods, so if you want to shape the next couple decades of policy, now is the time to get into politics.
In stable equilibria, punishment works. It essentially decreases $ϵ$ so it’s less likely for too many people to make a mistake at the same time, conserving the equilibrium. But progressives are climbing a narrow mountain pass, not sitting at the top of a local maximum. It’s much easier for disaffected members to shove society off the pass, so a policy of punishing defectors is not stable—the defectors can just defect more and win. This is why punishment doesn’t work; the only way forward is if everyone goes along with the plan.

James Camacho Jan 14, 2025, 10:08 PM
4 points
−2
in reply to: Mikhail Samin’s comment on: No one has the ball on 1500 Russian olympiad winners who’ve received HPMOR
Oh, I did misread your post. I thought these were just people on some mailing list that had no relation to HPMOR/EA and you were planning on sending them books as advertising. This makes a lot more sense, and I’m much more cool with this form of advertising.

EDIT: I will point out, it still does scream “cult tactic” to me, probably because it is targeting specific people who do not know there is a campaign behind the scenes to get them to join the group. I don’t think it is wrong to advertise to people who have given their consent, but I do think it is dangerous to have a culture where you discuss how to best advertise to specific people.

James Camacho Jan 14, 2025, 6:50 PM
−6 points
−7
in reply to: Mikhail Samin’s comment on: No one has the ball on 1500 Russian olympiad winners who’ve received HPMOR
Recruitment is only good because it serves your ideology. However, almost every group could claim the same, and most people don’t want to be spammed by thousands of groups trying to recruit them every year. Thus, when you send that email, you are defecting. Maybe you believe it is okay to defect, because your group is exceptional, but that is the trademark belief of a cult!

James Camacho Jan 13, 2025, 11:34 PM
7 points
−4
on: No one has the ball on 1500 Russian olympiad winners who’ve received HPMOR
This screams “cult tactic” to me. Is the point of EA to identify high-value targets and get them to help the EA community, or to target high-value projects that help the broader community?

James Camacho Jan 13, 2025, 11:31 PM
4 points
2
in reply to: Parker Conley’s comment on: No one has the ball on 1500 Russian olympiad winners who’ve received HPMOR
I’d recommend against that. It’s too similar to Mormonism w/ Marriott.

James Camacho Dec 30, 2024, 7:21 PM
0 points
0
in reply to: Anthony DiGiovanni’s comment on: Computational functionalism probably can’t explain phenomenal consciousness
Given that Euan begins his post with an axiom of materialism, it’s referenced in the quote I’m responding to, and I’m responding to Euan, not talking to a general audience, I think it’s your fault for intepreting it as “most people, full stop”.

James Camacho Dec 28, 2024, 3:12 PM
1 point
0
in reply to: Darmani’s comment on: If all trade is voluntary, then what is “exploitation?”
Dollars are essentially energy from physics, and trades are state transitions. So, in expectation entropy will increase. Suppose person $i$ controls a proportion $p_{i}$ of the dollars. In an efficient market, entropy will be maximal, so we want to find the distribution
$arg max - \sum p_{i} ln p_{i}, subject to \sum w_{i} p_{i} = Total Societal Wealth Generation .$
For a given Total Societal Wealth Generation, this is the Boltzmann distribution
$p_{i} \propto e^{β w_{i}}$
where $β$ is the temperature (frequency of trades). I subsumed $β w_{i}$ as a single constant in my earlier comment to simplify matters. I was incorrect in my earlier statement; if my $β w_{i}$ is two higher than yours (not twice as large), I should control $e^{2} \approx 7$ times as many dollars. I suspect some of the rise in CEO-to-worker compensation comes from $β$ increasing, some from a less conscientious society, and some from exploitation.

James Camacho Dec 28, 2024, 3:28 AM
3 points
0
on: If all trade is voluntary, then what is “exploitation?”
Exploitation is using a superior negotiating position to inflict great costs on someone else, at small benefit to yourself.

If someone is inflicting any cost on me for their own benefit, that is not a mutually beneficial trade, so your definition doesn’t solve the problem. You cannot just look at subtrades either—after all, you can always break up every trade into two transactions where you first only pay a cost, and then only get a benefit at someone else’s expense.

My definition is closer to this:

A trade is exploitative when it decreases a society’s wealth generating ability.

When people are paid less, they are less able to invest in the future. This includes upskilling, finding more promising ventures, starting their own business, or raising children. Some people are better at this than others, and an efficient market would give them control of more money to show this (roughly exponential). For example, if you are twice as good at wealth-creating than me, you should have about seven times as many dollars. If I make a trade with you, I should keep about 12% of the wealth created. Of course, this has to be after costs are taken into account.

The cost of subsistence is pretty negligible—maybe a few thousand dollars per year in the rural United States. Any other costs a company imposes on you should be paid before you distribute the pie you created. So, if they ask you to live in San Francisco and drive a car, that is easily $50,000/yr in before-earnings costs. Now, suppose your work as a developer nets them $500,000/yr. You should be making about $100,000/yr after taxes, which would be around $200,000/yr before taxes. If you are making less, there are three scenarios:
1. Your company is more than twice as good as you at wealth generation.
2. You are creating less than $500,000/yr of value.
3. You are being exploited!

James Camacho Dec 19, 2024, 3:37 AM
1 point
1
on: Lack of Social Grace Is an Epistemic Virtue
For humans from our world, these questions do have answers—complicated answers having to do with things like map–territory confusions that make receiving bad news seem like a bad event (rather than the good event of learning information about how things were already bad, whether or not you knew it), and how it’s advantageous for others to have positive-valence false beliefs about oneself.
If you have bad characteristics (e.g. you steal from your acquaintances), isn’t it in your best interest to make sure this doesn’t become common knowledge? You don’t want to normalize people pointing out your flaws, so you get mad at people for gossiping behind your back, or saying rude things in front of you.

James Camacho Dec 14, 2024, 7:15 PM
2 points
0
on: Creating Interpretable Latent Spaces with Gradient Routing
If you’re not already aware of the information bottleneck, I’d recommend The Information Bottleneck Method, Efficient Compression in Color Naming and its Evolution, and Direct Validation of the Information Bottleneck Principle for Deep Nets. You can use this with routing for forward training.

EDIT: Probably wasn’t super clear why you should look into this. An optimal autoencoder should try to maximize the mutual information between the encoding and the original image. You wouldn’t even need to train a decoder at the same time as the encoder! But, unfortunately, it’s pretty expensive to even approximate the mutual information. Maybe, if you route to different neurons based on image captions, you could significantly decrease this cost.

James Camacho Dec 14, 2024, 7:15 PM
2 points
0
in reply to: Jacob G-W’s comment on: g-w1′s Shortform
And I migrated my comment.

James Camacho Dec 13, 2024, 6:27 PM
2 points
0
in reply to: Jacob G-W’s comment on: g-w1′s Shortform
If you’re not already aware of the information bottleneck, I’d recommend The Information Bottleneck Method, Efficient Compression in Color Naming and its Evolution, and Direct Validation of the Information Bottleneck Principle for Deep Nets. You can use this with routing for forward training.

James Camacho Dec 13, 2024, 5:30 PM
1 point
1
in reply to: amelia’s comment on: Wave function collapse: Is it really that surprising?
Maybe, there’s an evolutionary advantage to thinking of yourself as distinct from the surrounding universe, that way your brain can simulate counterfactual worlds where you might take different actions. Will you actually take different actions? No, but thinking will make the one action you do take better. Since people are hardwired to think their observations are not necessarily interactions, updating in the other direction has significant surprisal.

James Camacho Dec 13, 2024, 3:41 PM
1 point
0
in reply to: amelia’s comment on: Wave function collapse: Is it really that surprising?
I think physicists like to think of the universe through a “natural laws” perspective, where things should work the same whether or not they were there to look at them. So, it seems strange when things do work differently when they look at them.

James Camacho Dec 13, 2024, 3:35 PM
0 points
0
on: Wave function collapse: Is it really that surprising?
The reason wave function collapse is so surprising, is because not collapsing seems to be the norm. In fact, the best gravimeters are made by interfering the wavefunctions of entire molecules (ref: atom interferometer). We only see “wave function collapse” in particular kinds of operations, which we then define as observations. So, it isn’t surprising that we observe wave function collapse—that’s how the word “observe” is defined. What is surprising is that collapse even occurs to be observed, when we know it is not how the universe usually operates.

James Camacho Dec 12, 2024, 11:53 PM
−4 points
0
in reply to: EuanMcLean’s comment on: Computational functionalism probably can’t explain phenomenal consciousness

and that’s because I think you don’t understand them either.

What am I supposed to do with this? The one effect this has is to piss me off and make me less interested in engaging with anything you’ve said.

Why is that the one effect? Jordan Peterson says that the one answer he routinely gives to Christians and atheists that piss them off is, “what do you mean by that?” In an interview with Alex O’Conner he says,

So people will say, well, do you believe that happened literally, historically? It’s like, well, yes, I believe that it’s okay. Okay. What do you mean by that? That you believe that exactly. Yeah. So, so you tell me you’re there in the way that you describe it.

Right, right. What do you see? What are the fish doing exactly? And the answer is you don’t know. You have no notion about it at all. You have no theory about it. Sure. You have no theory about it. So your belief is, what’s your belief exactly?

(25:19–25:36, The Jordan B. Peterson Podcast − 451. Navigating Belief, Skepticism, and the Afterlife w/ Alex O’Connor)

Sure, this pisses off a lot of people, but it also gets some people thinking about what they actually mean. So, there’s your answer: you’re supposed to go back and figure out what you mean. A side benefit is if it pisses you off, maybe I won’t see your writing anymore. I’m pretty annoyed at how the quality of posts has gone down on this website in the past few years.