James Camacho

Karma: 101

James Camacho Mar 12, 2025, 6:03 PM
1 point
−3
on: Your Communication Preferences Aren’t Law
As I like to say, ignorance does not excuse a sin, it makes two sins: the original, and the fact you didn’t put in the effort to know better. So, if you really do just possess a better method of communication—for example, you prefer talking disagreements out over killing each other—you’re completely justified in flexing superior on the clueless outsiders. This doesn’t mean it will always be effective, just that you’re not breaking the “cooperate unless defected against” strategy, and the rest of rational society shouldn’t punish you for it.

James Camacho Mar 8, 2025, 9:59 PM
1 point
−1
in reply to: Mis-Understandings’s comment on: How Do We Fix the Education Crisis?
The current education system focuses almost exclusively on the bottom 20%. If we’re expecting a tyranny of the majority, we should see the top and bottom losing out. Also, note that very few children actually have an 80% chance of ending up in the middle 80%, so you would really expect class warfare not a veil of ignorance if people are optimising specifically for their own future children’s education.

James Camacho Mar 8, 2025, 4:39 PM
4 points
2
in reply to: Lukas_Gloor’s comment on: Childhood and Education #9: School is Hell
Yeah, I don’t see why either. LessWrong allegedly has a utilitarian culture, and simply from the utilitarian “minimize abuse” perspective, you’re spot on. Even if home-schooling has similar or mildly lower rates of abuse, the weight of that abuse is higher.

James Camacho Mar 8, 2025, 4:28 PM
1 point
0
in reply to: Viliam’s comment on: How Do We Fix the Education Crisis?
Grade inflation originally began in the United States due to the Vietnam War draft. University students where exempt from the draft as long as they maintained high enough grades, so students became less willing to stretch their abilities and professors less willing to accurately report their abilities.

The issue is that grades are trying to serve three separate purposes:
1. Regular feedback to students on how well they understand the material.
2. Personal recommendations from teachers to prospective employers/universities.
3. Global comparisons between students.
The administration mostly believe grades serve the third purpose, so they advocate for fudging the numbers. “Last year, our new policies implemented at Goodhart School of Excellence improved the GPA by 0.5 points! Look at how successful our students are compared to others.” Teachers, on the other hand, usually want grades to serve the first two purposes. If we want to prevent Goodharting, we can either give teachers back their power, or use other comparison systems.

This is already kind-of a thing. Top universities no longer use GPA as a metric, except as a demerit for imperfect grades, relying more on standardized test scores. There was a brief period where they tried going test-optional, but MIT quickly reversed that trend. I don’t think a standardized exam is a perfect solution—how do you compare project- or lab-based classes, like computer science and chemistry? I think in these scenarios we could have students submit their work to third parties, much like the capstone project in AP Seminar & Research.

If we can get administrators to use a better (unfudgible) comparator, I’m not actually terribly worried whether teachers use grades to give regular feedback or recommend their students. It’s just important to make sure the comparator is hard enough to actually see a spread, even at the very top. The number of “perfect” ACT scores has increased by 25x in the past 25 years, and I understand why from a money-making perspective, but it’s really unfortunate that there are several dozen sixth-graders that could get a 36 in any given section (maybe not the same sixth-graders for each section). How is one school supposed to show it’s better at helping these kinds of students than another school? The answer right now is competitions; in seventh grade, I (and half a dozen others) switched schools solely because the other had won the state MATHCOUNTS competition. Word quickly gets around which schools have the best clubs, though it really is just the club, not the classes.

James Camacho Mar 8, 2025, 3:03 AM
1 point
0
on: Childhood and Education #9: School is Hell
I think the reason education got so bad is we don’t have accurate signals. Most studies use the passing rate as their metric of “achievement”, and that can only see changes among the bottom quintile. Or, they use standardized assessments, which usually do not go higher than the 90th percentile. I wrote a longer post here: https://www.lesswrong.com/posts/LPyqPrgtyWwizJxKP/how-do-we-fix-the-education-crisis

James Camacho Mar 3, 2025, 11:28 PM
1 point
0
in reply to: Greenless Mirror’s comment on: For the Sake of Pleasure Alone
Maybe it’s my genome’s fault that I care so much about future me. It is very similar to future it, and so it forces me to help it survive, even if in a very different person than I am today.

James Camacho Mar 3, 2025, 8:26 PM
1 point
0
in reply to: Greenless Mirror’s comment on: For the Sake of Pleasure Alone
When I say, “me,” I’m talking about my policy, so I’m a little confused when you say I could have been a different snapshot. Tautologically, I cannot. So, if I’m trying to maximize my pleasure, a Veil of Ignorance doesn’t make sense. The only case it really applies is when I make pacts like, “if you help bring me into existence, I’ll help you maximize your pleasure,” except those pacts can’t actually form. What really happens is existing people try to bring into existence people that will help them maximize their pleasure, either by having similar policies to their own, or being willing to serve them.

James Camacho Mar 2, 2025, 8:05 PM
2 points
0
on: For the Sake of Pleasure Alone
I try to be pragmatic, which means I only find it useful to consider constructive theories; anything else is not defined, and I would say you cannot even talk about them. This is why I take issue with many simple explanations of utilitarianism: people claim to “sum over everyone equally” while not having a good definition for “everyone” or “summing equally”. I think these are the two mistakes you are making in your post.
You say something like,
You never had the mechanism to choose who you would be born as, and the simplest option is pure chance.
but you cannot construct this simple option. It is impossible to choose a random number out of infinity where each number appears equally likely, so there must be some weighting mechanism. This gives you a mechanism to choose who you would be born as!
We have to first define what “you” even looks like. I take an approach akin to effective field theory, where I consider you a coarse policy that is being run, which is detailed enough to where it’s pragmatically useful to consider. I wrote a longer comment in another thread that explains this well enough. The key takeaway is that we can compare two policies with their KL-divergence, and thus we can compare “current you” to “future you”, or “current me” to “current you”.
I also hold to your timeless snapshot theory, though I would like to mention animals (including humans) are likely cognitively disabled in this regard. Processes that realized they were timeless snapshots are the same kinds of processes that have an existential crisis instead of enabling more of the same. Anyway, since we’re both timeless snapshots, me now and me ten seconds from now are not the same person. However, we have extremely similar policies, and thus are extremely similar people. By choosing to stay alive now, or choosing to think a certain way, I can choose how a very similar being to myself arises!
If you’re trying to maximise your pleasure, or your utility, you have to include all the beings that are similar to you in your summation. In particular, you should be weighing like
$U_{overall} = \sum π is a policy 2^{- K L (π | | snapshot of you)} U_{π}$
If $π$ is a hedonistic sum utilitarian, then
$U_{π} = \sum p is a person pleasure (p) .$
There’s not really a reason $π$ would be a hedonistic sum utilitarian, unless that’s close to the policy of your current snapshot. Such a policy isn’t evolutionarily stable, since it can be invaded by policies that act the same, except purely selfish when they can get away with it. In fact, every policy can be invaded like this. So, over time, the policies similar to you will become more and more selfish. However, you usually don’t find yourself to be a selfish egoist, because eventually your snapshot dies and a child with more altruistic brainwashing takes its place as the next most similar policy.
Now, I’d like to poke a little at the difference between selfish egoism and utilitarianism. To make them both constructive, you have to specify who “you” are, what your preferences are, what other people’s preferences you care about, and how much you weigh these preferences. You’ll end up with a double sum,
$\sum π 2^{- K L (π | | you)} \sum i is a preference w_{i} u_{i}$
Utilitarians claim to weigh others’ preferences so much that they actually end up better off by sacrificing for the greater good. They wouldn’t even think of it as a sacrifice! But, if it’s not a sacrifice, the selfish egoist would take the very same actions! So, are selfish egoists really just sheep in wolves’ clothing? People who get a bad rapport, because others assume their preferences are misaligned with theirs, when the utilitarian’s are just as often? I think this is the case, but perhaps the difference comes from how they treat fundamental disagreements.
You can build a weight matrix out of everyone’s weights for each others’ preferences. If we have three people, Alice, Bob, and Eve, a matrix
$W = ⎡ ⎢ ⎣ \begin{matrix} 0.9 & 0.2 & - 0.1 0.1 & 0.8 & 0.1 - 0.5 & - 0.5 & 2 \end{matrix} ⎤ ⎥ ⎦$
might say Alice and Bob are mildly friendly to one another, while Eve hates their guts. Since
$⎡ ⎢ ⎣ \begin{matrix} U_{Alice} U_{Bob} U_{Eve} \end{matrix} ⎤ ⎥ ⎦ \propto W ⎡ ⎢ ⎣ \begin{matrix} U_{Alice} U_{Bob} U_{Eve} \end{matrix} ⎤ ⎥ ⎦$
their utilities are some eigenvector of $W$ . There are three eigenvectors:
$⎡ ⎢ ⎣ \begin{matrix} 0.06 0.52 0.42 \end{matrix} ⎤ ⎥ ⎦, ⎡ ⎢ ⎣ \begin{matrix} 0.4 3.4 - 2.8 \end{matrix} ⎤ ⎥ ⎦, ⎡ ⎢ ⎣ \begin{matrix} 2.4 0.0 - 1.4 \end{matrix} ⎤ ⎥ ⎦$
Alice would prefer they choose the last one, Bob the second, and Eve the first, so this is a fundamental disagreement. I think the only difference that makes sense is to define the selfish egoist as someone who will fight for their preferred utility function, while the utilitarian as someone who will fight for whichever has the highest eigenvalue.

James Camacho Feb 28, 2025, 10:59 PM
2 points
0
in reply to: rotatingpaguro’s comment on: Economics Roundup #5
Not quite. I think people working more do get more done, but it ends up lowering wages and decreasing the entropy of resource allocation (concentrates it to the top). If you’re looking for the good of the society, you probably want the greatest free energy,
$\frac{GDP}{Total Assets} + temperature \cdot ({Theil}_{T} - ln population) .$
The temperature is usually somewhere between $0.2$ (economic boom) and $1.0$ (recessions), and $\frac{GDP}{Total Assets} \approx 0.1$ in the United Kingdom. I couldn’t find a figure for the Theil index, but the closest I got is that Croatia’s was $0.295$ and Serbia’s was $0.369$ in 2011, and for income (not assets) the United Kingdom’s was $0.268$ in 2005. So, some very rough estimates for the free energies are
$\begin{matrix} 0.1 + 0.2 (0.268 - 17.9) & = - 3.4 & (United Kingdom, 2005) 0.05 + 0.5 (0.295 - 15.3) & = - 7.4 & (Croatia, 2011) \end{matrix}$
The ideal point for the number of hours worked is where the GDP increases as fast as the temperature times the decrease in entropy. I’m not aware of any studies showing this, but I believe this point is much lower than the number of hours people are currently working in the United Kingdom.

James Camacho Feb 27, 2025, 2:58 AM
2 points
−3
on: Economics Roundup #5

Dan Neidle: The 20,000% spike at £100,000 is absolutely not a joke – someone earning £99,999.99 with two children under three in London will lose an immediate £20k if they earn a penny more. The practical effect is clearer if we plot gross vs net income.

Can’t it actually be good to encourage people to not work? I’d imagine if everyone in the United Kingdom worked half the number of hours, salaries wouldn’t decrease very much. Their society, as a whole, doesn’t need to work so many hours to maintain the quality of life, they only individually need to because they drive each others’ wages down.

James Camacho Feb 22, 2025, 7:34 PM
1 point
0
in reply to: Jiro’s comment on: The case for the death penalty

We know what societies that mutilate prisoners are like, because plenty of them have existed.

This is where I disagree. There are only a few post-industrial socieities that have done this, and they were already rotten before starting the mutilation (e.g. Nazi Germany). There is nothing to imply that mutilation will turn your society rotten, only that when your society becomes rotten mutilation may begin.

James Camacho Feb 22, 2025, 5:20 AM
6 points
2
in reply to: Jiro’s comment on: The case for the death penalty
So, you’re making two rather large claims here that I don’t agree with.

When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems—dictatorship, corruption, punishment that varies between social classes, lack of due process, etc.

This seems more a quirk of scarcity than due to having a bad justice system. Historically, it wasn’t just the tryannical, corrupt governments that punished people with mutlation, it was every civilization on the planet! I think it’s due to a combination of (1) hardly having enough food and shelter for the general populace, let alone resources for criminals, and (2) a lower-information, lower-trust society where there’s no way to check for a prior criminal history, or prevent them from committing more crimes after they leave jail. Chopping off a hand or branding them was a cheap way to dole out punishment and warn others to be extra cautious in their vicinity.

Actual humans aren’t capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.

Obviously it isn’t possible for imperfectly rational agents to be perfectly fair, but I don’t see why you’re applying this only to a mutalitive justice system. This is true of our current justice system or when you buy groceries at the store. The issue isn’t making mistakes, the issue is the frequency of mistakes. They create an entropic force that pushes you out of good equilibriums, which is why it’s good to have systems that fail gracefully.

I don’t see what problems mutilative justice would have over incarcerative. We could have the exact same court procedures, just change the law on the books from 3–5 years to 3–5 fingers. Is the issue that bodily disfigurement is more visible than incarceration? People would have to actually see how they’re ruining other people’s lives in retribution? Or are you just stating, without any justification, that when we move from incarceration to mutilation, our judges, jurors, and lawyers will suddenly become wholly irrational beings? That it’s just “human nature”? To put it in your words: that opinion is bizarre.

James Camacho Feb 22, 2025, 1:17 AM
1 point
−6
in reply to: Jiro’s comment on: The case for the death penalty
I don’t understand your objection. Would you rather go to prison for five years or lose a hand? Would you rather unfairly be imprisoned for five years, and then be paid $10mn in compensation, or unfairly have your hand chopped off and paid $10mn in compensation? I think most people would prefer mutilation over losing years of their lives, especially when it was a mistake. Is your point that, if someone is in prison, they can be going through the appeal process, and thus, if a mistake occurs they’ll be less damaged? Because currently it takes over eight years for the average person to be exonerated (source). Since this only takes into account those exonerated, the average innocent person sits there much longer.

I do agree that bodily mutilation can be abused more than imprisonment since you can only take political prisoners as long as you have power, but it’s not like tyrants are using bodily mutilation as punishment anyway. They just throw them to the Gulags and call it a day. They don’t have to wait 40 years for it to become permanent.

James Camacho Feb 21, 2025, 10:48 PM
0 points
−1
on: The case for the death penalty
Similar disclaimer: don’t assume these are my opinions. I’m merely advocating for a devil.

If we’re going for efficiency, I feel like we can get most of the safety gains with tamer measures. For example, you could cut off a petty thief’s hand, or castrate a rapist. The actual procedure would be about as expensive as execution, but if a mistake was made there is still a living person to pay reparations to. I think you could also make the argument that this is less cruel than imprisoning someone for years—after all, people have a “right to life, liberty, and the pursuit of happiness”, not a right to all their limbs and genitals.

Another thing we can do is punish not only the criminal, but their friends and family too. We can model people as having the policy to take certain actions in a given environment. The ultimate goal of the justice system is to decrease the weight of certain defective policies in the general populace, either through threat, force, or elimination. When we get good enough mindreaders, we can just directly compare each person’s policy to the defective ones, and change the environment to mitigate defection. Until then, we have to make do with approximations, and one’s culture, especially the shared culture among friends and family, is a very good measure for how similar two people’s policies will be. So, if we find someone defecting, it makes sense to punish not only them, but their friends and family for a couple generations too.

James Camacho Feb 20, 2025, 3:14 AM
2 points
0
on: The Dilemma’s Dilemma
If you’re going to be talking about trust in society, you should definitely take a look at Gossner’s Simple Bounds on the Value of a Reputation.

James Camacho Feb 19, 2025, 8:15 PM
1 point
0
in reply to: Ben’s comment on: Celtic Knots on Einstein Lattice
The bottom row is close to what I imagine, but without IO ports on the same edge being allowed to connect to each other (though that is also an interesting problem). These would be the three diagrams for the square:
The middle one makes a single loop which is one-third of them, and $n = 4 / 2 = 2$ in this case. My guess for how to prove the recurrence is to “glue” polygons together:
There are $n + 1$ pairs of sizes $(k, n + 1 - k)$ we can glue together (if you’re okay with $2$ -sided polygons), but I haven’t made much progress in this direction. All I’ve found is gluing two polygons together decreases the number of loops by zero, one or two.

James Camacho Feb 18, 2025, 10:00 PM
1 point
0
in reply to: Ben’s comment on: Celtic Knots on Einstein Lattice
So, I’m actually thinking about something closer to this for “one loop”:
This is on a single square tile, with four ports of entry/exit. What I’ve done is doubled the rope in each connection, so there is one connection going from the top to the bottom and a different connection going from the bottom to the top. Then you tie off the end of each connection with the start of the connection just clockwise to it.
Some friends at MIT solved this problem for a maths class, and it turns out there’s a nice recurrence. Let $P (n, ℓ)$ be the probability there are $ℓ$ loops in a random knot on a single tile with $2 n$ sides. Then
$P (n, ℓ) = \frac{2}{n + 1} P (n - 1, ℓ - 1) + \frac{n - 1}{n + 1} P (n - 2, ℓ) .$
So, if you’re looking for exactly one loop, you’d have
$P (n, 1) = \frac{n - 1}{n + 1} P (n - 2, 1) ⟹ P (n, 1) = {\begin{matrix} \frac{1}{n + 1} & n is even 0 & n is odd. \end{matrix}$
I can’t really explain where this recurrence comes from; their proof was twenty pages long. It’s also too complicated to really apply to multiple tiles. But, maybe there’s a more elementary proof for this recursion, and something similar can be done for multiple tiles.

James Camacho Feb 18, 2025, 4:18 AM
7 points
0
in reply to: Ben’s comment on: Celtic Knots on Einstein Lattice
Your math is correct, it’s $13!!$ and $(\frac{14}{2}) = 91$ for the number of tiles and connections. I wrote some code here:
https://github.com/programjames/einstein_tiling
Here’s an example:
An interesting question I have is: suppose we tied off the ends going clockwise around the perimeter of the figure. What is the probability we have exactly one loop of thread, and what is the expected number of loops? This is a very difficult problem; I know several MIT math students who spent several months on a slightly simpler problem.

James Camacho Feb 17, 2025, 10:19 PM
3 points
0
on: Celtic Knots on Einstein Lattice
The sidelengths for the Einstein tile are all either $1$ or $\sqrt{3}$ , except for a single side of length $2$ . I think it makes more sense to treat that side as two sides, with a $180^{\circ}$ angle between them. Then you would get fourteen entry/exit points:
The aperiodic tiling from the paper cannot be put onto a hexagonal grid, and some of the tiles are flipped vertically, so you need every edge to have an entry/exit to make a Celtic knot out of it. Also, I would recommend using $T i l e (1, 1)$ rather than $T i l e (1, \sqrt{3})$ so the arcs turn out pretty:

James Camacho Feb 1, 2025, 4:08 AM
1 point
0
on: What’s the Right Way to think about Information Theoretic quantities in Neural Networks?
I’m not entirely sure what you’ve looked at in the literature; have you seen “Direct Validation of the Information Bottleneck Principle for Deep Nets” (Elad et al.)? They use the Fenchel conjugate
\[\mathrm{KL}(P||Q) = \sup_{f} [\mathbb{E}_P[f]-\log (\mathbb{E}_Q[e^f])]\]
This turns finding the KL-divergence into an optimisation problem for $f^*(x) = \log \frac{p(x)}{q(x)}$. Since
\[I(X;Y)=\mathrm{KL}(P_{X,Y}||P_{X\otimes Y}),\]
you can train a neural network to predict the mutual information. For the information bottleneck, you would train two additional networks. Ideal lossy compression maximises the information bottleneck, so this can be used as regularization for autoencoders. I did this for a twenty-bit autoencoder of MNIST (code). Here are the encodings without mutual information regularization, and then with it:
Notice how the digits are more “spread out” with regularization. This is exactly the benefit variational autoencoders give, but actually based in theory! In this case, my randomness comes from the softmax choice for each bit, but you could also use continuous latents like Gaussians. In fact, you could even have a deterministic encoder, since the mutual information predictor network is adversarial, not omniscient. The encoder can fool it during training by having small updates in its weights make large updates in the latent space, which means (1) it’s essentially random in the same way lava lamps are random, (2) the decoder will learn to ignore noise, and (3) the initial distribution of encoder weights will concentrate around “spines” in the loss landscape, which have lots of symmetries in all dimensions except the important features you’re encoding.
The mutual information is cooperative for the decoder network, which means the decoder should be deterministic. Since mutual information is convex, i.e. if we had two decoders $\phi_1, \phi_2: Z\to Y$, then
\[\lambda I(Z; \phi_1(Z)) + (1-\lambda) I(Z; \phi_2(Z)) \ge I(Z; (\lambda\phi_1 + (1-\lambda)\phi_2)(Z)).\]
Every stochastic decoder could be written as a sum of deterministic ones, so you might as well use the best deterministic decoder. Then
\[I(Z; \phi(Z)) = H(\phi(Z)) \cancel{-H(\phi(Z)|Z)}\]
so you’re really just adding in the entropy of the decoded distribution. The paper “The Deterministic Information Bottleneck” (Strauss & Schwab) argues that the encoder $\psi: X\to Z$ should also be deterministic, since maximising
\[-\beta I(X; \psi(X)) = \beta[H(\psi(X)|X) - H(\psi(X))]\]
seems to reward adding noise irrelevant to $X$. But that misses the point; you want to be overwriting unimportant features of the image with noise. It’s more clear this is happening if you use the identity
\[-\beta I(X;\psi(X)) = \beta H(X|\psi(X))\cancel{ - \beta H(X)}\]
(the second term is constant). This is the same idea behind variational autoencoders, but they use $\mathrm{KL}(\psi(X)||\mathcal{N}(0, 1))$ as a cheap proxy to $H(X|\psi(X))$.