David Udell

Karma: 2,550

David Udell May 1, 2025, 12:12 AM
15 points
2
on: David Udell’s Shortform
Epistemic status: Just a confusion I once had, and how I eventually resolved it to my satisfaction.
In ordinary differential equations, separability is a deductive rule stating that whenever you have a differential equation of the form
$f (x) = g (y) \frac{d y}{d x}$
you can then reason that
$f (x) d x = g (y) d y$
and then than
$\int f (x) d x = \int g (y) d y$
From the very first time I saw that, I was immediately off-put by that middle equation. What the hell does an expression like $f (x) d x$ (by itself) even mean? Until I saw this, I had figured that, apart from their weird notation, differentiation and integration were just plain-old multivariate functions. I had made sense of their notation by just ignoring it, basically. And when I held that point of view, the above deduction is just nonsensical.
I also remember not getting good clarificatory answers about this at the time! I mostly recall being told to just ignore the middle equation and take the whole conditional on faith, as something that has been separately proven.
Eventually, I learned that there was this idea in math called differential forms which gave a precise-and-everywhere-valid interpretation to the stand-alone expression $d x$ . But you don’t quite need that machinery to resolve the above thing that bothered me.
Did you know that “calculus,” is an abridgment of the original term “the infinitesimal calculus”? “The rules for soundly manipulating infinitesimal quantities,” basically. I did not know this when I first encountered this separability thing. There’s a whole saga, maybe even the main story in mathematics, about why that interpretation and corresponding terminology fell out of favor.
The basic infinitesimal calculus idea (which is only sometimes, not always, a valid interpretation of the symbols) is that
$\int    sum of their products over a range function value      f (x) d x    infintesimal quantity$
(I very vividly remember the moment when I discovered that the integral sign was just a stylized “S”, for “sum”!) Now you cannot everywhere use the above separability reasoning on the strength of the infinitesimal interpretation. Again, it’s not an everywhere-valid interpretation!
Once you’re using any everywhere-valid interpretation, using any way of giving $d x$ and $\int$ their own independent meanings as symbols, though, the separability deduction just falls out! If two things are equal, you can multiply both by any mathematical object and get a true equation. It doesn’t matter what kind of mathematical object $d x$ is. If two things are equal, you can apply the same operation to both and get a true equation. It doesn’t matter what the integration summation operation amounts to, precisely.

David Udell Jul 10, 2024, 11:58 PM
2 points
0
in reply to: RogerDearnaley’s comment on: Causal Graphs of GPT-2-Small’s Residual Stream
I sampled hundreds of short context snippets from openwebtext, and measured ablation effects averaged over those sampled forward-passes. Averaged over those hundreds of passes, I didn’t see any real signal in the logit effects, just a layer of noise due to the ablations.
More could definitely be done on this front. I just tried something relatively quickly that fit inside of GPU memory and wanted to report it here.

David Udell Jul 10, 2024, 11:02 PM
2 points
0
in reply to: RogerDearnaley’s comment on: Causal Graphs of GPT-2-Small’s Residual Stream
Could you hotlink the boxes on the diagrams to that, or add the resulting content as a hover text to areas, in them or something? This might be hard to do on LW: I suspect some Javascript code might be required to do this sort of thing, but perhaps a library exists for this?
My workaround was to have the dimension links laid out below each figure.
3: [14555]
4: [5030]
5: [10603]
6: [3290]
7: [4330]
My current “print to flat .png” approach wouldn’t support hyperlinks, and I don’t think LW supports .svg images.

David Udell Jul 9, 2024, 11:41 PM
5 points
1
in reply to: the gears to ascension’s comment on: Causal Graphs of GPT-2-Small’s Residual Stream
That line was indeed quite poorly phrased. It now reads:
At the bottom of the box, blue or red token boxes show the tokens most promoted (blue) and most suppressed (red) by that dimension.
That is, you’re right. Interpretability data on an autoencoder dimension comes from seeing which token probabilities are most promoted and suppressed when that dimension is ablated, relative to leaving its activation value alone. That’s an ablation effect sign, so the implied, plotted promotion effect signs are flipped.

David Udell Apr 23, 2024, 10:15 PM
21 points
2
on: David Udell’s Shortform
The main thing I got out of reading Bostrom’s Deep Utopia is a better appreciation of this “meaning of life” thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.
The book’s premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you’d never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you’re into learning, just ask! And similarly for any psychological state you’re thinking of working towards.
So, in that regime, it’s effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything’s heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it’s important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil… but this defeats the purpose of those values. It’s not practical benevolence if you had to ask for the danger to be left in place; it’s not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you.
Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you’ll have to live with all your “localistic” values satisfied but meaning mostly absent.
It helps to see this meaning thing if you frame it alongside all the other objectivistic “stretch goal” values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being.
Considerations that in today’s world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars… We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).

David Udell Mar 29, 2024, 1:00 AM
2 points
0
in reply to: Scott Garrabrant’s comment on: The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
I believe I and others here probably have a lot to learn from Chris, and arguments of the form “Chris confidently believes false thing X,” are not really a crux for me about this.
Would you kindly explain this? Because you think some of his world-models independently throw out great predictions, even if other models of his are dead wrong?

David Udell Nov 1, 2023, 10:17 PM
20 points
on: David Udell’s Shortform
Use your actual morals, not your model of your morals.

David Udell Sep 26, 2023, 12:47 AM
2 points
0
in reply to: Charlie Steiner’s comment on: Sparse Coding, for Mechanistic Interpretability and Activation Engineering
I agree that stronger, more nuanced interpretability techniques should tell you more. But, when you see something like, e.g.,
25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per
25134 ▁I, ▁My, I, ▁personally
isn’t it pretty obvious what those two autoencoder neurons were each doing?

David Udell Sep 23, 2023, 7:20 PM
4 points
0
in reply to: LawrenceC’s comment on: Sparse Coding, for Mechanistic Interpretability and Activation Engineering
No, towards an $L^{0}$ value. $L^{1}$ is the training proxy for that, though.

David Udell Jul 24, 2023, 10:58 PM
4 points
on: David Udell’s Shortform
Epistemic status: Half-baked thought.

Say you wanted to formalize the concepts of “inside and outside views” to some degree. You might say that your inside view is a Bayes net or joint conditional probability distribution—this mathematical object formalizes your prior.
Unlike your inside view, your outside view consists of forms of deferring to outside experts. The Bayes nets that inform their thinking are sealed away, and you can’t inspect these. You can ask outside experts to explain their arguments, but there’s an interaction cost associated with inspecting the experts’ views. Realistically, you never fully internalize an outside expert’s Bayes net.
Crucially, this means you can’t update their Bayes net after conditioning on a new observation! Model outside experts as observed assertions (claiming whatever). These assertions are potentially correlated with other observations you make. But because you have little of the prior that informs those assertions, you can’t update the prior when it’s right (or wrong).
To the extent that it’s expensive to theorize about outside experts’ reasoning, the above model explains why you want to use and strengthen your inside view (instead of just deferring to outside really smart people). It’s because your inside view will grow stronger with use, but your outside view won’t.

David Udell Jun 7, 2023, 12:43 AM
7 points
1
in reply to: noggin-scratcher’s comment on: The Base Rate Times, news through prediction markets
(Great project!) I strongly second the RSS feed idea, if that’d be possible.

David Udell Jun 1, 2023, 10:06 PM
11 points
4
on: Work dumber not smarter
I think that many (not all) of your above examples boil down to optimizing for legibility rather than optimizing for goodness. People who hobnob instead of working quietly will get along with their bosses better than their quieter counterparts, yes. But a company of brown nosers will be less productive than a competitor company of quiet hardworking employees! So there’s a cooperate/defect-dilemma here.
What that suggests, I think, is that you generally shouldn’t immediately defect as hard as possible, with regard to optimizing for appearances. Play the prevailing local balance between optimizing-for-appearances and optimizing-for-outcomes that everyone around does, and try to not incrementally lower the level of org-wide cooperation. Try to eke that level of cooperation up, and set up incentives accordingly.

David Udell May 26, 2023, 3:43 AM
4 points
in reply to: David Udell’s comment on: David Udell’s Shortform
The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.
This is not a coincidence because nothing is a coincidence.

David Udell May 2, 2023, 2:02 AM
2 points
on: David Udell’s Shortform
Two moments of growing in mathematical maturity I remember vividly:
1. Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols… could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word “is”!
2. Learning about the objects that mathematical claims are about. Going from having to look up “Wait, what’s a real number again?” to knowing how $Z$ , $Q$ , and $R$ interrelate told me what we’re making claims about. Of course, there are plenty of other mathematical objects—but getting to know these objects taught me the general pattern.

David Udell Apr 1, 2023, 1:54 AM
31 points
7
on: Exposure to Lizardman is Lethal
I found it distracting that all your examples were topical, anti-red-tribe coded events. That reminded me of
In Artificial Intelligence, and particularly in the domain of nonmonotonic reasoning, there’s a standard problem: “All Quakers are pacifists. All Republicans are not pacifists. Nixon is a Quaker and a Republican. Is Nixon a pacifist?”
What on Earth was the point of choosing this as an example? To rouse the political emotions of the readers and distract them from the main question? To make Republicans feel unwelcome in courses on Artificial Intelligence and discourage them from entering the field? (And no, I am not a Republican. Or a Democrat.)
Why would anyone pick such a distracting example to illustrate nonmonotonic reasoning? Probably because the author just couldn’t resist getting in a good, solid dig at those hated Greens. It feels so good to get in a hearty punch, y’know, it’s like trying to resist a chocolate cookie.
As with chocolate cookies, not everything that feels pleasurable is good for you.
That is, I felt reading this like there were tribal-status markers mixed in with your claims that didn’t have to be there, and that struck me as defecting on a stay-non-politicized discourse norm.

David Udell Mar 30, 2023, 9:12 PM
2 points
on: David Udell’s Shortform
2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs
…
12. The principal of a private school is a member of Planned Parenthood and, off-duty, speaks out about contraception and the morning after pill. The board of the private school decides this is inappropriate given the school’s commitment to abstinence and moral education and asks the principal to stop these speaking engagements or step down from his position.
a) The school board is acting within its rights; they can insist on a principal who shares their values
b) The school board should back off; it’s none of their business what he does in his free time
…
[Difference] of 0 to 3: You are an Object-Level Thinker. You decide difficult cases by trying to find the solution that makes the side you like win and the side you dislike lose in that particular situation.
[Difference] of 4 to 6: You are a Meta-Level Thinker. You decide difficult cases by trying to find general principles that can be applied evenhandedly regardless of which side you like or dislike.
--Scott Alexander, “The Slate Star Codex Political Spectrum Quiz”
The Character of an Epistemic Prisoner’s Dilemma
Say there are two tribes. The tribes hold fundamentally different values, but they also model the world in different terms. Each thinks members of the other tribe are mistaken, and that some of their apparent value disagreement would be resolved if the others’ mistakes were corrected.
Keeping this in mind, let’s think about inter-tribe cooperation and defection.
Ruling by Reference Classes, Rather Than Particulars
In the worst equilibrium, actors from each tribe evaluate political questions in favor of their own tribe, against the outgroup. In their world model, this is to a great extent for the benefit of the outgroup members as well.
But this is a shitty regime to live under when it’s done back to you too, so rival tribes can sometimes come together to implement an impartial judiciary. The natural way to do this is to have a judiciary classifier rule for reference classes of situations, and to have a separate impartial classifier sort situations into reference classes.
You’re locally worse off this way, but are globally much better off.

David Udell Mar 17, 2023, 10:01 PM
7 points
on: David Udell’s Shortform
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don’t seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with not too much more work. Given that, you won’t want to update dramatically in favor of the claim—the powerful evidence to the contrary could, you infer, be unearthed without much more work. You learn something about the other side of the issue from how quickly or slowly the world yielded evidence in the other direction. If it’s considered a social faux pas to give strong arguments for one side of a claim, then your prior about how hard it is to find strong arguments for that side of the claim will be doing a lot of the heavy lifting in fixing your world model. And so on, for the evidential consequences of other kinds of motivated search and rationalization.
In brief, you can do epistemically better than ignoring how much search power went into finding all the evidence. You can do better than only evaluating the object-level evidential considerations! You can take expended search into account, in order to model what evidence is likely hiding, where, behind how much search debt.

David Udell Mar 3, 2023, 1:37 AM
7 points
on: David Udell’s Shortform
Modest spoilers for planecrash (Book 9 -- null action act II).
Nex and Geb had each INT 30 by the end of their mutual war. They didn’t solve the puzzle of Azlant’s IOUN stones… partially because they did not find and prioritize enough diamonds to also gain Wisdom 27. And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion’s spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly measured by Detect Thoughts nor by tests of legible ability at using existing math. (Keltham has slightly above-average intelligence for dath ilan, reflectivity well below average, and an ordinary amount of that spark.)
But most of all, Nex and Geb didn’t solve IOUN stones because they didn’t come from a culture that had already developed digital computation and analog signal processing. Or on an even deeper level—because those concepts can’t really be that hard at INT 30, even if your WIS is much lower and you are missing some sparks—they didn’t come from a culture which said that inventing things like that is what the Very Smart People are supposed to do with their lives, nor that Very Smart People are supposed to recheck what their society told them were the most important problems to solve.
Nex and Geb came from a culture which said that incredibly smart wizards were supposed to become all-powerful and conquer their rivals; and invent new signature spells that would be named after them forever after; and build mighty wizard-towers, and raise armies, and stabilize impressively large demiplanes; and fight minor gods, and surpass them; and not, particularly, question society’s priorities for wizards. Nobody ever told Nex or Geb that it was their responsibility to be smarter than the society they grew up in, or use their intelligence better than common wisdom said to use it. They were not prompted to look in the direction of analog signal processing; and, more importantly in the end, were not prompted to meta-look around for better directions to look, or taught any eld-honed art of meta-looking.
--Eliezer, planecrash

David Udell Mar 2, 2023, 12:39 AM
2 points
on: David Udell’s Shortform
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world’s AI trajectory. I’m sure the “savior sequence” exists mathematically, but finding it is a whole different ballgame.

David Udell Feb 13, 2023, 2:07 AM
1 point
on: David Udell’s Shortform
In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom’s motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom’s motion that the dimension in that direction vanished. That left only three dimensions of space—all perpendicular to the atom’s direction of motion—and the ghost of the lost fourth dimension, which makes itself felt as the current of time. Now atoms moving in different directions cannot share the same directional flow of time. Each takes on the particular current it perceives as the proper measure of time.
…
You measure only… as projected on your time and space dimensions.
--Lewis Carroll Epstein, Relativity Visualized (1997)

David Udell

The Character of an Epistemic Prisoner’s Dilemma

Ruling by Reference Classes, Rather Than Particulars