James Camacho

Karma: 101

James Camacho Aug 30, 2024, 11:25 PM
11 points
−2
on: things that confuse me about the current AI market.
1. Is this apparent parity due to a mass exodus of employees from OpenAI, Anthropic, and Google to other companies, resulting in the diffusion of “secret sauce” ideas across the industry?
No. There isn’t much “secret sauce”, and these companies never had a large amount of AI talent to begin with. Their advantage is being in a position with hype/reputation/size to get to market faster. It takes several months to setup the infrastructure (getting money, data, and compute clusters), but that’s really the only hurdle.
1. Does this parity exist because other companies are simply piggybacking on Meta’s open-source AI model, which was made possible by Meta’s massive compute resources? Now, by fine-tuning this model, can other companies quickly create models comparable to the best?
No. “Everyone” in the AI research community knew how to build Llama, multi-modal models, or video diffusion models a year before they came out. They just didn’t have $10M to throw around.

Also, fine-tuning isn’t really the way to go. I can imagine people using it as a teacher during the warming up phase, but the coding infrastructure doesn’t really exist to fine-tune or integrate another model as part of a larger one. It’s usually easier to just spend the extra time securing money and training.
1. Is it plausible that once LLMs were validated and the core idea spread, it became surprisingly simple to build, allowing any company to quickly reach the frontier?
Yep. Even five years ago you could open a Colab notebook and train a language translation model in a couple of minutes.
1. Are AI image generators just really simple to develop but lack substantial economic reward, leading large companies to invest minimal resources into them?
No, images are much harder than language. With language models, you can exactly model the output distribution, while the space of images is continuous and much too large for that. Instead, the best models measure the probability flow (e.g. diffusion/normalizing flows/flow-matching), and follow it towards high-probability images. However, parts of images should be discrete. You know humans have five fingers, or text has words in it, but flows assume your probabilities are continuous.

Imagine you have a distribution that looks like

__|_|_|__

A flow will round out those spikes into something closer to

_/^\/^\/^\__

which is why gibberish text or four-and-a-half fingers appear. In video models, this leads to dogs spawning and disappearing into the pack.
1. Could it be that legal challenges in building AI are so significant that big companies are hesitant to fully invest, making it appear as if smaller companies are outperforming them?
Partly when it comes to image/video models, but this isn’t a huge factor.
1. And finally, why is OpenAI so valuable if it’s apparently so easy for other companies to build comparable tech? Conversely, why are these no name companies making leading LLMs not valued higher?
I think it’s because AI is a winner-takes-all competition. It’s extremely easy for customers to switch, so they all go to the best model. Since ClosedAI already has funding, compute, and infrastructure, it’s risky to compete against them unless you have a new kind of model (e.g. LiquidAI), reputation (e.g. Anthropic), or are a billionaire’s pet project (e.g. xAI).

James Camacho Aug 29, 2024, 8:09 AM
3 points
−1
on: James Camacho’s Shortform
Religious freedoms are a subsidy to keep the temperature low. There’s the myth that societies will slowly but surely get better, kind of like a gradient descent. If we increase the temperature too high, an entropic force would push us out of a narrow valley, so society could become much worse (e.g. nobody wants the Spanish Inquisition). It’s entirely possible that the stable equilibrium we’re being attracted to will still have religion.

James Camacho Jul 5, 2024, 11:46 PM
1 point
0
in reply to: notfnofn’s comment on: Isomorphisms don’t preserve subjective experience… right?
Can’t you choose an arbitrary encoding procedure? Choosing a different one only adds a constant number of bits. Also, my comment on discounted entropy was a little too flippant. What I mean is closer to entropy rate with a discount factor, like in soft-actor critic. Maximizing your ability to have options in the future requires a lot of “agency”.

Maybe consciousness should be more than just agency, e.g. if a chess bot were trained to maximize entropy, games wouldn’t be as strategic as if it wants to get a high*-energy payoff. However, I’m not convinced energy even exists? Humans learn strategy because their genes are more likely to survive, thrive, and have choices in the future when they win. You could even say elementary particles are just the ones still around since the Big Bang.

*Note: The physicists should reverse the sign on energy. While they’re at it, switch to inverse-temperature.

James Camacho Jul 4, 2024, 12:40 AM
1 point
0
in reply to: notfnofn’s comment on: Isomorphisms don’t preserve subjective experience… right?
Consider all programs encoding isomorphisms from a rock to something else (e.g. my brain, or your brain). If the program takes $k$ bits to encode, we add $2^{- k}$ times the other entity to the rock (times some partition number so all the weights add up to one). Since some programs are automorphisms, we repeatedly do this until convergence.

The rock will now possess a tiny bit of consciousness, or really any other property. However, where do we get the original “sources” of consciousness? If you’re a solipsist, you might say, “I am the source of consciousness.” I think a better definition is your discounted entropy.

James Camacho Jul 3, 2024, 11:05 PM
1 point
0
on: Isomorphisms don’t preserve subjective experience… right?
An isomorphism isn’t enough. Stealing from Robert (Lastname?), you could make an isomorphism from a rock to your brain, but you likely wouldn’t consider it “conscious”. You have to factor out the Kolmogorov complexity of the isomorphism.

James Camacho Jul 3, 2024, 7:03 PM
1 point
0
in reply to: Dagon’s comment on: Manifold Predicted the AI Extinction Statement and CAIS Wanted it Deleted
Would insider trading work out if everyone knew who was asking to trade with them ahead of time?

James Camacho Jun 29, 2024, 3:24 AM
1 point
0
in reply to: Gerald Monroe’s comment on: Fertility Rate Roundup #1
It could be a case of a backward-bending curve. Fewer children make the economy worse, so more people choose to work rather than have children.

James Camacho Apr 30, 2024, 6:48 PM
3 points
0
in reply to: Nathan Helm-Burger’s comment on: Nathan Helm-Burger’s Shortform
The computer vision researchers just chose the wrong standard. Even the images they train on come in [pixel_position, color_channels] format.

James Camacho Apr 2, 2024, 10:58 PM
1 point
0
in reply to: jmh’s comment on: Open Thread Spring 2024
Age limits do exist: you have to be at least 35 to run for President, at least 30 for Senator, and 25 for Representative. This automatically adds a decade or two to your candidates.

James Camacho Apr 1, 2024, 5:02 AM
4 points
0
in reply to: johnswentworth’s comment on: You Don’t Exist, Duncan
In earlier times, I spent an incredible amount of my mental capacity trying to accurately model those around me. I can count on zero hands the number of people that reciprocated. Even just treating me as real as I treated them would fit on one hand. On the other hand, nearly everyone I talk to does not have “me” as even a possibility in their model.

James Camacho Mar 7, 2024, 9:47 PM
1 point
0
in reply to: Daniel Kokotajlo’s comment on: Evidential Correlations are Subjective, and it might be a problem
It just takes a very long time in practice, see “Basins of Attraction” by Ellison.

James Camacho Feb 27, 2024, 6:22 PM
2 points
0
on: Ideological Bayesians
I’ve been thinking about something similar, and might write a longer post about it. However, the solution to both is to anneal on your beliefs. Rather than looking at the direct probabilities, look at the logits. You can then raise the temperature, let the population kind of randomize their beliefs, and cool it back down.
See “Solving Multiobjective Game in Multiconflict Situation Based on Adaptive Differential Evolution Algorithm with Simulated Annealing” by Li et. al.

Idea: NV⁻ Centers for Brain Interpretability

James CamachoFeb 18, 2024, 5:28 AM

6 points

1 comment3 min readLW link

James Camacho Feb 7, 2024, 6:26 PM
2 points
1
on: What’s this 3rd secret directive of evolution called? (survive & spread & ___)
Perhaps “fit”, from the Latin fio (come about) + English fit (fit). An object must fit, survive, and spread.

James Camacho Jan 30, 2024, 5:07 AM
1 point
0
on: A short ‘derivation’ of Watanabe’s Free Energy Formula
To see how much the minimal point $ω^{*}$ contributes to the integral we can integrate it in its vicinity
I think you should be looking at the entire stable island, not just integrating from zero to one. I expect you could get a decent approximation with Lie transform perturbation theory, and this looks similar to the idea of macro-states in condensed matter physics, but I’m not knowledgeable in these areas.

James Camacho Jan 30, 2024, 4:37 AM
1 point
0
on: A short ‘derivation’ of Watanabe’s Free Energy Formula
−N∑i=1logp(yi|xi,w)
You have a typo, the equation after Free Energy should start with $N L L (w) = - \sum log (p (y | x_{i}, w_{i})) q (y_{i} | x_{i}) .$
Also the third line should be $\int \dots + \int$ , not minus.
Also, usually people use $θ$ for model parameters (rather than $w$ ). I don’t know the etymology, but game theorists use the same letter (for “types” = models of players).

James Camacho Jan 20, 2024, 2:10 AM
4 points
1
in reply to: TsviBT’s comment on: What Software Should Exist?
Also sometimes when I explain what a hyperphone is well enough for the other person to get it, and then we have a complex conversation, they agree that it would be good. But very small N, like 3 to 5.
It’s difficult to understand your writing, and I feel like you could improve in general at communication based on this quote. The concept of a hyperphone isn’t that complex—the ability to branch in conversations—so the modifiers “well enough”, “complex”, and “very small N” make me believe it’s only complex because you’re unclear.
For example, the blog post you linked to is titled “Hyperphone”, yet you never define a hyperphone. I can infer from the section on streaming what you imagine, but that’s the second-to-last section!

James Camacho Jan 7, 2024, 11:58 PM
0 points
−4
on: Bayesians Commit the Gambler’s Fallacy
There’s the automorphism
$(x_{1}, x_{2}, x_{3}, \dots, x_{n}) \mapsto (x_{1}, x_{1} \oplus x_{2}, x_{2} \oplus x_{3}, \dots, x_{n - 1} \oplus x_{n})$
which turns a switchy distribution into a sticky one, and vice versa. The two have to be symmetric, so your conclusion cannot be correct.

James Camacho Jan 7, 2024, 11:27 PM
1 point
0
on: Bayesians Commit the Gambler’s Fallacy
This means the likelihood distribution over data generated by Steady is closer to the distribution generated by Switchy than to the distribution generated by Sticky.
Their KL divergences are exactly the same. Suppose Baylee’s observations are $x_{1}, \dots, x_{n}$ . Let $P (x_{1}, \dots, x_{n})$ be the probability if there’s a $p$ chance of switching, and similar for $Q$ . By the chain rule,
$\begin{matrix} D_{K L} (P (x_{1}, \dots, x_{n}) | | Q (x_{1}, \dots, x_{n})) & = D_{K L} (P (x_{1}) | | Q (x_{1})) + n - 1 \sum i = 1 D_{K L} (P (x_{i + 1} | x_{i}) | | Q (x_{i + 1} | x_{i})) = 0 + (n - 1) [p log \frac{p}{q} + (1 - p) log \frac{1 - p}{1 - q}] . \end{matrix}$
In particular, when either $p$ or $q$ is equal to one half, this divergence is symmetric for the other variable.

James Camacho Dec 17, 2023, 12:45 AM
4 points
0
in reply to: gwern’s comment on: Memory bandwidth constraints imply economies of scale in AI inference
The problem with etching specific models is scale. It costs around $1M to design a custom chip mask, so it needs to be amortized over tens or hundreds of thousands of chips to become profitable. But no companies need that many.
Assume a model takes 3e9 flops to infer the next token, and these chips run as fast as H100s, i.e. 3e15 flops/s. A single chip can infer 1e6 tokens/s. If you have 10M active users, then 100 chips can provide each user a token every 10ms, around 600wpm.
Even OpenAI would only need hundreds, maybe thousands of chips. The solution is smaller-scale chip production. There are startups working on electron beam lithography, but I’m unaware of a retailer Etched could buy from right now.
EDIT: 3 trillion flops/token (similar to GPT-4) is 3e12, so that would be 100,000 chips. The scale is actually there.

James Camacho

Idea: NV⁻ Cen­ters for Brain Interpretability

Idea: NV⁻ Centers for Brain Interpretability