aribrill

Karma: 196

Ari Brill, independent AI safety researcher & PhD astrophysicist

aribrill Apr 2, 2025, 8:19 PM
2 points
0
on: Renormalization Roadmap
This post is great to see, I think renormalization is a very exciting direction for AI safety research!
First, as one possible way to represent the real-world, we can think of representation_0 as a low-energy description of the dataset: IR_0.
If the NN is capable of learning a meaningful generalization of the data, representation_0 flows to representation_1 (now UV_1) via an implicit RG flow to higher energies. Instead of throwing information away, flowing to UV_1 adds structure that allows it to more reliably adapt to unseen information.
Shouldn’t this go the other way, with representation_0 being UV and representation_1 being IR? A NN compresses the input representation (data) to obtain a coarse-grained output representation (label). The ability to throw away information, i.e. the irrelevant noise w.r.t. the target function, is what enables generalization to unseen inputs differing in fine-grained details.

Neural Scaling Laws Rooted in the Data Distribution

aribrillFeb 20, 2025, 9:22 PM

7 points

0 comments1 min readLW link

(arxiv.org)

aribrill Jan 13, 2025, 9:45 PM
7 points
0
on: Dmitry’s Koan
The notion of a precision scale for interpretability is really interesting, particularly the connection with generalization/memorization. This seems like a fruitful concept to develop further.
How, then can we operationalize the loss scale of a phenomenon? Well, one way to do this is to imagine that we have some “natural” complexity parameter c that can be varied (this can be a parameter tuning model size, training length, etc.).
It could be interesting to think about the interpretation of different possible complexity parameters here. You might expect these to give rise to distinct but related notions of generalization. Here, I’m drawing on intuitions from my work connecting scaling laws to data distributions, though I hadn’t put it in exactly these terms before. (I’ll write a LW post summarizing/advertising this paper, but haven’t gotten to it yet...)
One interesting scaling regime is scaling in effective model size (given infinite training data & compute). You can also think about scaling in training data size (given infinite model capacity & compute). I think model scaling is basically akin to what you’re talking about here. Data scaling could be useful to study too, as it gets around the need to understand & measure effective model capacity. Of course these are theoretical limits, in practice one usually scales model size & data together in a compute-optimal way, you’re probably not training to convergence, etc.
If the data distribution consists of clusters of varying size (alternatively, subtasks of varying importance), then taking model size as the complexity parameter could give a notion of generalization as modeling the most important parts of the data distribution. Memorization then consists of modeling rarely observed or unimportant components. On the other hand, taking data size as the complexity parameter would suggest that generalization consists of coarsely modeling the entire data distribution, with it being memorization-like to model fine details and exceptions anywhere.
It also would be interesting to think about other complexity scaling parameters, for example, test-time compute in an AlphaZero-style setting.
If possible, we would like models in this class to be “locally simultaneously interpretable”, i.e. that for two nearby values $c \approx c^{'}$ , the models M_c and M_c’ have similar weights and implement similar circuits.
My impression of what one is supposed to expect from this is that as the complexity parameter increases, the learned circuits quantitatively improve, but never undergo a radical qualitative shift at any particular scale. Would you agree with that? So for a “good circuit”, the explained loss is basically monotonic, slowly decreasing or stable around ~100% as one goes from complexity 0 to the cutoff complexity, and decreasing below that. But if there were a qualitative step change, you would see instead a peak in the explained loss around the cutoff complexity, increasing above and decreasing below that. In that situation, the loss precision scale would seem less natural as a measure of circuit understanding.
Basically, the concern would be something like a model implementing an algorithm with low algorithmic complexity but a large constant factor, so it can only emerge and become dominant at some critical model scale. (Similar to but not exactly the same as grokking.) One realistic possible instance of this might be the emergence of in-context learning in LLMs only at large enough scales.

aribrill Nov 19, 2024, 8:56 PM
3 points
0
in reply to: StefanHex’s comment on: StefanHex’s Shortform
Thanks for the great writeup.
Superposition (“local codes”) require sparsity, i.e. that only few features are active at a time.
Typo: I think you meant to write distributed, not local, codes. A local code is the opposite of superposition.

Computational Complexity as an Intuition Pump for LLM Generality

aribrillJun 25, 2024, 8:25 PM

18 points

6 comments3 min readLW link

aribrill Feb 14, 2024, 2:31 AM
1 point
0
on: Natural abstractions are observer-dependent: a conversation with John Wentworth
Short answer: some goals incentivize general intelligence, which incentivizes tracking lots of abstractions and also includes the ability to pick up and use basically-any natural abstractions in the environment at run-time.
Longer answer: one qualitative idea from the Gooder Regulator Theorem is that, for some goals in some environments, the agent won’t find out until later what its proximate goals are. As a somewhat-toy example: imagine playing a board game or video game in which you don’t find out the win conditions until relatively late into the game. There’s still a lot of useful stuff to do earlier on—instrumental convergence means that e.g. accumulating resources and gathering information and building general-purpose tools are all likely to be useful for whatever the win condition turns out to be.
As I understand this argument, even if an agent’s abstractions depend on its goals, it doesn’t matter because disparate agents will develop similar instrumental goals due to instrumental convergence. Those goals involve understanding and manipulating the world, and thus require natural abstractions. (And there’s the further claim that a general intelligence can in fact pick up any needed natural abstraction as required.)
That covers instrumental goals, but what about final goals? These can be arbitrary, per the orthogonality thesis. Even if an agent develops a set of natural abstractions for instrumental purposes, if it has non-natural final goals, it will need to develop a supplementary set of non-natural goal-dependent abstractions to describe them as well.
When it comes to an AI modeling human abstractions, it does seem plausible to me that humans’ lowest-level final goals/values can be described entirely in terms of natural abstractions, because they were produced by natural selection and so had to support survival & reproduction. It’s a bit less obvious to me this still applies to high-level cultural values (would anyone besides a religious Jew naturally develop the abstraction of kosher animal?). In any case, if it’s sufficiently important for the AI to model human behavior, it will develop these abstractions for instrumental purposes.
Going the other direction, can humans understand, in terms of our abstractions, those that an AI develops to fulfill its final goals? I think not necessarily, or at least not easily. An unaligned or deceptively aligned mesa-optimizer could have an arbitrary mesa-objective, with no compact description in terms of human abstractions. This matters if the plan is to retarget an AI’s internal search process. Identifying the original search target seems like a relevant intermediate step. How else can you determine what to overwrite, and that you won’t break things when you do it?
I claim that humans have that sort of “general intelligence”. One implication is that, while there are many natural abstractions which we don’t currently track (because the world is big, and I can’t track every single object in it), there basically aren’t any natural abstractions which we can’t pick up on the fly if we need to. Even if an AI develops a goal involving molecular squiggles, I can still probably understand that abstraction just fine once I pay attention to it.
This conflates two different claims.
1. A general intelligence trying to understand the world can develop any natural abstraction as needed. That is, regularities in observations / sensory data → abstraction / mental representation.
2. A general intelligence trying to understand another agent’s abstraction can model its implications for the world as needed. That is, abstraction → predicted observational regularities.
The second doesn’t follow from the first. In general, if a new abstraction isn’t formulated in terms of lower-level abstractions you already possess, integrating it into your world model (i.e. understanding it) is hard. You first need to understand the entire tower of prerequisite lower-level abstractions it relies on, and that might not be feasible for a bounded agent. This is true whether or not all these abstractions are natural.
In the first case, you have some implicit goal that’s guiding your observations and the summary statistics you’re extracting. The fundamental reason the second case can be much harder relates to this post’s topic: the other agent’s implicit goal is unknown, and the space of possible goals is vast. The “ideal gas” toy example misleads here. In that case, there’s exactly one natural abstraction (P, V, T), no useful intermediate abstraction levels, and the individual particles are literally indistinguishable, making any non-natural abstractions incoherent. Virtually any goal routes through one abstraction. A realistic general situation may have a huge number of equally valid natural abstractions pertaining to different observables, at many levels of granularity (plus an enormous bestiary of mostly useless non-natural abstractions). A bounded agent learns and employs the tiny subset of these that helps achieve its goals. Even if all generally intelligent agents have the same potential instrumental goals that could enable them to learn the same natural abstractions, without the same actual instrumental goals, they won’t.

aribrill Feb 15, 2014, 7:13 PM
0 points
on: Meetup : Yale: Initial Meetup
Unfortunately I am busy from 2-5 on Sundays, but I would certainly like to attend a future Yale meetup at some other time.

aribrill Aug 2, 2013, 6:07 AM
55 points
on: Rationality Quotes August 2013
In 2002, Wizards of the Coast put out Star Wars: The Trading Card Game designed by Richard Garfield.

As Richard modeled the game after a miniatures game, it made use of many six-sided dice. In combat, cards’ damage was designated by how many six-sided dice they rolled. Wizards chose to stop producing the game due to poor sales. One of the contributing factors given through market research was that gamers seem to dislike six-sided dice in their trading card game.

Here’s the kicker. When you dug deeper into the comments they equated dice with “lack of skill.” But the game rolled huge amounts of dice. That greatly increased the consistency. (What I mean by this is that if you rolled a million dice, your chance of averaging 3.5 is much higher than if you rolled ten.) Players, though, equated lots of dice rolling with the game being “more random” even though that contradicts the actual math.
- Mark Rosewater, Kind Acts of Randomness

aribrill Jun 3, 2013, 4:04 AM
60 points
on: Rationality Quotes June 2013
Why is there that knee-jerk rejection of any effort to “overthink” pop culture? Why would you ever be afraid that looking too hard at something will ruin it? If the government built a huge, mysterious device in the middle of your town and immediately surrounded it with a fence that said, “NOTHING TO SEE HERE!” I’m pretty damned sure you wouldn’t rest until you knew what the hell that was—the fact that they don’t want you to know means it can’t be good.

Well, when any idea in your brain defends itself with “Just relax! Don’t look too close!” you should immediately be just as suspicious. It usually means something ugly is hiding there.
- David Wong, The 5 Ugly Lessons Hiding in Every Superhero Movie
What links here?
- Vaniver's comment on Rationality Quotes November 2013 by MalcolmOcean (Nov 12, 2013, 3:19 AM; 3 points)

aribrill Jan 3, 2013, 5:35 AM
19 points
on: Rationality Quotes January 2013
“How is it possible! How is it possible to produce such a thing!” he repeated, increasing the pressure on my skull, until it grew painful, but I didn’t dare object. “These knobs, holes...cauliflowers—” with an iron finger he poked my nose and ears—“and this is supposed to be an intelligent creature? For shame! For shame, I say!! What use is a Nature that after four billion years comes up with THIS?!”

Here he gave my head a shove, so that it wobbled and I saw stars.

“Give me one, just one billion years, and you’ll see what I create!”
- Stanislaw Lem, “The Sanatorium of Dr. Vliperdius” (trans. Michael Kandel)

aribrill Dec 29, 2012, 5:50 AM
0 points
in reply to: David_Gerard’s comment on: Intelligence explosion in organizations, or why I’m not worried about the singularity
That’s certainly true. It seems to me that in this case, sbenthall was describing entities more akin to Google than to the Yankees or to the Townsville High School glee club; “corporations” is over-narrow but accurate, while “organizations” is over-broad and imprecise.

aribrill Dec 29, 2012, 5:38 AM
2 points
in reply to: sbenthall’s comment on: Intelligence explosion in organizations, or why I’m not worried about the singularity
I think that as a general rule, specific examples and precise language always improve an argument.

aribrill Dec 28, 2012, 6:25 AM
1 point
on: Intelligence explosion in organizations, or why I’m not worried about the singularity
I get the sense that “organization” is more or less a euphemism for “corporation” in this post. I understand that the term could have political connotations, but it’s hard (for me at least) to easily evaluate an abstract conclusion like “many organizations are of supra-human intelligence and strive actively to enhance their cognitive powers” without trying to generate concrete examples. Imprecise terminology inhibits this.

When you quote lukeprog saying

It would be a kind of weird corporation that was better than the best human or even the median human at all the things that humans do. [Organizations] aren’t usually the best in music and AI research and theory proving and stock markets and composing novels.

should the word “corporation” in the first sentence be “[organization]”?

aribrill Apr 3, 2012, 7:59 PM
0 points
in reply to: Randaly’s comment on: Rationality Quotes April 2012
The typing quirks actually serve a purpose in the comic. Almost all communication among the characters takes place through chat logs, so the system provides a handy way to visually distinguish who’s speaking. They also reinforce each character’s personality and thematic associations—for example, the character quoted above (Aranea) is associated with spiders, arachnids in general, and the zodiac sign of Scorpio.

Unfortunately, all that is irrelevant in the context of a Rationality Quote.

aribrill Apr 3, 2012, 7:40 PM
6 points
on: Rationality Quotes April 2012

Dear, my soul is grey
With poring over the long sum of ill;
So much for vice, so much for discontent...
Coherent in statistical despairs
With such a total of distracted life,
To see it down in figures on a page,
Plain, silent, clear, as God sees through the earth
The sense of all the graves, - that’s terrible
For one who is not God, and cannot right
The wrong he looks on. May I choose indeed
But vow away my years, my means, my aims,
Among the helpers, if there’s any help
In such a social strait? The common blood
That swings along my veins, is strong enough
To draw me to this duty.

-Elizabeth Barrett Browning, Aurora Leigh, 1856

aribrill Aug 13, 2010, 4:25 PM
17 points
on: Five-minute rationality techniques
A simple technique I used to use was that whenever I started to read or found a link for an article that made me uncomfortable or instinctively want to avoid it, I forced myself to read it. After a few times I got used to it and didn’t have to do this anymore.

aribrill Aug 13, 2010, 1:52 AM
0 points
in reply to: ata’s comment on: Welcome to Less Wrong! (2010-2011)
I’m sorry, I’m not.

aribrill Aug 13, 2010, 12:44 AM
8 points
on: Welcome to Less Wrong! (2010-2011)
Hello! I’ve been a reader of Less Wrong for several months, although I never bothered to actually create an account until now. I originally discovered LW from a link through some site called “The Mentat Wiki.” I consider myself an atheist and a skeptic. I’m entering my senior year of high school, and I plan on majoring in Physics at the best college I can get into!

Actually, I had come across EY’s writings a few months earlier while trying to find out who this “Bayes” was that I had seen mentioned a couple different blogs I read. That was a pleasant connection for me.

I had an interesting time testing Tversky and Kahneman’s Anchoring Bias for my end of the year project in my 11th grade Statistics class. On the plus side, we found a strong anchoring effect. On the minus side, it was a group project, and my groupmates were...not exactly rationalists. I had to kind of tiptoe around what LW actually was.

Since I’ve started reading Less Wrong, I think the best sign of my improvement as a rationalist is that a number of concepts here that I used to find penetrating or insightful now seem obvious or trivial. On the other hand, I think a red flag is that I haven’t really made any major revisions to my beliefs or worldview other than those coming directly from LW.

I look forward to learning as much as I can from Less Wrong, and perhaps commenting as well!

aribrill

Neu­ral Scal­ing Laws Rooted in the Data Distribution

Com­pu­ta­tional Com­plex­ity as an In­tu­ition Pump for LLM Gen­er­al­ity

Neural Scaling Laws Rooted in the Data Distribution

Computational Complexity as an Intuition Pump for LLM Generality