Kaarel

Karma: 946

kaarelh AT gmail DOT com

personal website

Kaarel May 27, 2025, 8:58 PM
7 points
1
in reply to: Martín Soto’s comment on: Martín Soto’s Shortform

Summarizing documents, and exploring topics I’m no expert in: Super good

I think you probably did this, but I figured it’s worth checking: did you check this on documents you understand well (such as your own writing) and topics you are an expert on?

Kaarel May 23, 2025, 7:37 AM
9 points
0
in reply to: Garrett Baker’s comment on: D0TheMath’s Shortform
I think this approach doesn’t make sense. Issues, briefly:
- if you want to be squaring $D$ , you need it to be square — you should append another row of $0$ s
- this matrix $D$ does not have a logarithm, because it isn’t invertible ( https://en.wikipedia.org/wiki/Logarithm_of_a_matrix#Existence )^[1]
- there in fact isn’t any matrix $X$ that could reasonably be considered a $D^{3 / 2}$ , because such an $X$ should satisfy $X^{2} = D^{3}$ , but the matrix $D^{3}$ does not have a square root (see e.g. https://math.stackexchange.com/a/66156/540174 for how to think about this)
1. ↩︎
  also, note that it generally doesn’t make sense to speak of the $log$ of a matrix — a matrix can have (infinitely) many logarithms ( https://en.wikipedia.org/wiki/Logarithm_of_a_matrix#Example:_Logarithm_of_rotations_in_the_plane )

Kaarel May 22, 2025, 5:47 PM
10 points
2
in reply to: Davidmanheim’s comment on: Can We Naturalize Moral Epistemology?
I’d rather your “that is” were a “for example”. This is because:
- It’s also possible for the process of updates to not be getting arbitrarily close to any endpoint (with a notion of closeness that is imo appropriate for this context), without there being any sentence on which one keeps changing one’s mind. If we’re thinking of one’s “ethical state of mind” as being given by the probabilities one assigns to some given countable collection of sentences, then here I’m saying that it can be reasonable to use a notion of convergence which is stronger than pointwise convergence. For math, if one just runs a naive proof search and assigns truth value 1 to proven sentences and 0 to disproven sentences, one could try to say this sequence of truth value assignments is converging to the assignment that gives 1 to all provable sentences and 0 to all disprovable sentences (and whatever the initialization assigns to all independent sentences, let’s say), but I think that in our context of imagining some long reflection getting close to something in finite time, it’s more reasonable to say that one isn’t converging to anything in this example — it seems pretty intuitive to say that after any finite number of steps, one hasn’t really made much progress toward this kinda-endpoint (after all, one will have proved only finitely many things, and one still has infinitely many more things left to prove). Bringing this a tad closer to ethical reality: we could perhaps imagine someone repeatedly realizing that projects they hadn’t really considered before are worth working on, infinitely many times, with what they are up to thus changing [by a lot] [infinitely many times]. To spell out the connection of this to the math example a bit more: the common point is that novelty can appear in the sentences/things considered, so one can have novelty even if novelty doesn’t keep showing up in how one relates to any given sentence/thing. I say more about these themes here.

Kaarel May 20, 2025, 6:50 PM
9 points
0
on: Mental software updates
I feel like items on your current list have $< 10 %$ of the responsibility for what I’d consider software updates in humans, and that it sorta fails to address almost all the ordinary stuff that goes on when individual humans are learning stuff (from others or independently) or when “humanity is improving its thinking”. But that makes me think that maybe I’m missing what you’re going for with this list?^[1] Continuing with the (possibly different) question I have in mind anyway, here’s a list that imo points toward a decent chunk of what is missing from your list, with a focus on the case of independent and somewhat thoughtful learning/[thinking-improving] (as opposed to the case of copying from others, and as opposed to the case of fairly non-thoughtful thinking-improving)^[2]:
- a mathematician coming up with a good mathematical concept and developing a sense of how to use it (and ditto for a mathematical system/formalism)^[3]
- seeing a need to talk about something and coining a word for it
- a philosopher trying to clarify/re-engineer a concept, eg by seeing which more precise definition could accord with the concept having some desired “inferential role”
- noticing and resolving tensions in one’s views
- discovering/inventing/developing the scientific method; inventing/developing p-values; improving peer review
- discussing what kinds of evidence could help with some particular scientific question
- inventing writing; inventing textbooks
- the varied thought that is upstream of a professional poker player thinking the way they do when playing poker
- asking oneself “was that a reasonable inference?”, “what auxiliary construction would help with this mathematical problem?”, “which techniques could work here?”, “what is the main idea of this proof?”, “is this a good way to model the situation?”, “can I explain that clearly?”, “what caused me to be confused about that?”, “why did I spend so long pursuing this bad idea?”, “how could I have figured that out faster?”, “which question are we asking, more precisely?”, “why are we interested in this question?”, “what is this analogous to?”, “what should I read to understand this better?”, “who would have good thoughts on this?”
1. ↩︎
  I will note that when I say $< 10 %$ , this is wrt a measure that cares a lot about understanding how it is that one improves at doing difficult thinking (like, math/philosophy/science/tech research), and I could maybe see your list covering $> 10 %$ if one cared relatively more about software updates affecting one’s emotional/social life or whatever, but I’d need to think more about that.
2. ↩︎
  it has such a focus in large part because such a list was easy for me to provide — the list is copied from here with light edits
3. ↩︎
  two sorta-examples: humanity starting to think probabilistically, humanity starting to think in terms of infinitesimals

Kaarel May 17, 2025, 4:07 PM
1 point
0
in reply to: Richard_Kennaway’s comment on: Direct Realism is probably false
I agree it’s a pretty unfortunate/silly question. Searle’s analysis of it in Chapter 1 of Seeing Things as They Are is imo not too dissimilar to your analysis of it here, except he wouldn’t think that one can reasonably say “the world we see around us is an internal perceptual copy” (and I myself have trouble compiling this into anything true also), though he’d surely agree that various internal things are involved in seeing the world. I think a significant fraction of what’s going on with this “disagreement” is a bunch of “technical wordcels” being annoyed at what they consider to be careless speaking that they take to be somewhat associated with careless thinking.

Kaarel May 17, 2025, 5:56 AM
1 point
0
in reply to: Said Achmiz’s comment on: Direct Realism is probably false
see e.g. Chapter 1 of Searle’s Seeing Things as They Are for an exposition of the view usually called direct realism (i’m pretty sure you guys (including the op) have something pretty different in mind than that view and i think it’s plausible you all would actually just agree with that view)

Kaarel May 11, 2025, 9:42 PM
3 points
0
on: Why “Solving Alignment” Is Likely a Category Mistake
a tract of mine expressing some similar thoughts :)

Kaarel May 6, 2025, 9:08 AM
1 point
0
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
i’d be interested in hearing why you think that cultural/moral/technological/mathematical maturity is even possible or eventually likely (as opposed to one just being immature forever^[1]) (assuming you indeed do think that)
1. ↩︎
  which seems more likely to me

Kaarel May 6, 2025, 6:41 AM
9 points
6
on: The crucible — how I think about the situation with AI
As I write this list, I’ve a nagging feeling I’m missing some things.

my first two thoughts (i’m aware you arguably sorta already say sth like these things, but imo these deserve to be said more clearly):
- ideas, methods, tech for making humans smarter and wiser
- ideas, methods, tech for being able to implement an AGI ban

Kaarel May 1, 2025, 10:16 PM
1 point
0
in reply to: Joe Carlsmith’s comment on: Can we safely automate alignment research?
oki! in this scenario, i guess i’m imagining humans/humanity becoming ever-more-artificial (like, ever-more-[human/mind]-made) and ever-more-intelligent (like, eventually much more capable than anything that might be created by 2100), so this still seems like a somewhat unnatural framing to me

Kaarel May 1, 2025, 9:06 PM
1 point
0
on: Can we safely automate alignment research?
What then? One option is to never build superintelligence. But there’s also another option, namely: trying to get access to enhanced human labor, via the sorts of techniques I discussed in my post on waystations (e.g., whole brain emulation). In particular: unlike creating an alignment MVP, which plausibly requires at least some success in learning how to give AIs human-like values, available techniques for enhancing human labor might give you human-like values by default, while still resulting in better-than-human alignment research capabilities. Call this an “enhanced human labor” path.^[12]
[12]: Though: note that if you thought that even an alignment MVP couldn’t solve the alignment problem, you need some story about why your enhanced human labor would do better.

something that is imo important but discordant with the analysis you give here:
* humans/humanity could also just continue becoming more intelligent/capable (i mean: in some careful, self-conscious, deliberate fashion; so not like: spawning a random alien AI that outfooms humans; of course, what this means is unclear — it would imo need to be figured out ever-better as we proceed), like maybe forever

Kaarel Apr 21, 2025, 6:08 AM
3 points
0
in reply to: Davey Morse’s comment on: Davey Morse’s Shortform

even if you’re mediocre at coming up with ideas, as long as it’s cheap and you can come up with thousands, one of them is bound to be promising

for ideas which are “big enough”, this is just false, right? for example, so far, no LLM has generated a proof of an interesting conjecture in math

Kaarel Apr 21, 2025, 5:08 AM
1 point
0
in reply to: Davey Morse’s comment on: Davey Morse’s Shortform
coming up with good ideas is very difficult as well

(and it requires good judgment, also)

Kaarel Apr 13, 2025, 12:12 PM
2 points
−2
on: How training-gamers might function (and win)
I’ve only skimmed the post so the present comment could be missing the mark (sorry if so), but I think you might find it worthwhile/interesting/fun to think (afresh, in this context) about how come humans often don’t wirehead and probably wouldn’t wirehead even with much much longer lives (in particular, much longer childhoods and research careers), and whether the kind of AI that would do hard math/philosophy/tech-development/science will also be like that.^[1]^[2]
1. ↩︎
  I’m not going to engage further on this here, but if you’d like to have a chat about this, feel free to dm me.
2. ↩︎
  I feel like clarifying that I’d inside-view say P( the future is profoundly non-human (in a bad sense) | AI (which is not pretty much a synthetic human) smarter than humans is created this century ) >0.98 despite this.

Kaarel Apr 9, 2025, 12:20 PM
1 point
−2
in reply to: leogao’s comment on: leogao’s Shortform
i agree that most people doing “technical analysis” are doing nonsense and any particular well-known simple method does not actually work. but also clearly a very good predictor could make a lot of money just looking at the past price time series anyway

Kaarel Apr 8, 2025, 2:59 PM
3 points
0
in reply to: Ivan Vendrov’s comment on: Ivan Vendrov’s Shortform
it feels to me like you are talking of two non-equivalent types of things as if they were the same. like, imo, the following are very common in competent entities: resisting attempts on one’s life, trying to become smarter, wanting to have resources (in particular, in our present context, being interested in eating the Sun), etc.. but then whether some sort of vnm-coherence arises seems like a very different question. and indeed even though i think these drives are legit, i think it’s plausible that such coherence just doesn’t arise or that thinking of the question of what valuing is like such that a tendency toward “vnm-coherence” or “goal stability” could even make sense as an option is pretty bad/confused^[1].

(of course these two positions i’ve briefly stated on these two questions deserve a bunch of elaboration and justification that i have not provided here, but hopefully it is clear even without that that there are two pretty different questions here that are (at least a priori) not equivalent)
1. ↩︎
  briefly and vaguely, i think this could involve mistakenly imagining a growing mind meeting a fixed world, when really we will have a growing mind meeting a growing world — indeed, a world which is approximately equal to the mind itself. slightly more concretely, i think things could be more like: eg humanity has many profound projects now, and we would have many profound but currently basically unimaginable projects later, with like the effective space of options just continuing to become larger, plausibly with no meaningful sense in which there is a uniform direction in which we’re going throughout or whatever

Kaarel Apr 8, 2025, 10:15 AM
3 points
0
on: kh’s Shortform
a chat with Towards_Keeperhood on what it takes for sentences/phrases/words to be meaningful

Towards_Keeperhood:
- you could define “mother(x,y)” as “x gave birth to y”, and then “gave birth” as some more precise cluster of observations, which eventually need to be able to be identified from visual inputs
Kaarel:
- if i should read this as talking about a translation of “x is the mother of y”, then imo this is a bad idea.
- in particular, i think there is the following issue with this: saying which observations “x gave birth to y” corresponds to intuitively itself requires appealing to a bunch of other understanding. it’s like: sure, your understanding can be used to create visual anticipations, but it’s not true that any single sentence alone could be translated into visual anticipations — to get a typical visual anticipation, you need to rely on some larger segment of your understanding. a standard example here is “the speed of light in vacuum is 3*10^8 m/s” creating visual anticipations in some experimental setups, but being able to derive those visual anticipations depends on a lot of further facts about how to create a vacuum and properties of mirrors and interferometers and so on (and this is just for one particular setup — if we really mean to make a universally quantified statement, then getting the observation sentences can easily end up requiring basically all of our understanding). and it seems silly to think that all this crazy stuff was already there in what you meant when you said “the speed of light in vacuum is $3 \times 10^{8}$ m/s”. one concrete reason why you don’t want this sentence to just mean some crazy AND over observation sentences or whatever is because you could be wrong about how some interferometer works and then you’d want it to correspond to different observation sentences
- this is roughly https://en.wikipedia.org/wiki/Confirmation_holism as a counter to https://en.wikipedia.org/wiki/Verificationism
- that said, i think there is also something wrong with some very strong version of holism: it’s not really like our understanding is this unitary thing that only outputs visual anticipations using all the parts together, either — the real correspondence is somewhat more granular than that
TK:
- On reflection, I think my “mother” example was pretty sloppy and perhaps confusing. I agree that often quite a lot of our knowledge is needed to ground a statement in anticipations. And yeah actually it doesn’t always ground out in that, e.g. for parsing the meaning of counterfactuals. (See “Mixed Reference: The great reductionist project”.)
K:
- i wouldn’t say a sentence is grounded in anticipations with a lot of our knowledge, because that makes it sound like in the above example, “the speed of light is $3 \times 10^{8}$ m/s” is somehow privileged compared to our understanding of mirrors and interferometers even though it’s just all used together to create anticipations; i’d instead maybe just say that a bunch of our knowledge together can create a visual anticipation
TK:
- thx. i wanted to reply sth like “a true statement can either be tautological (e.g. math theorems) or empirical, and for it to be an empirical truth there needs to be some entanglement between your belief and reality, and entanglement happens through sensory anticipations. so i feel fine with saying that the sentence ‘the speed of light is $3 \times 10^{8}$ m/s’ still needs to be grounded in sensory anticipations”. but i notice that the way i would use “grounded” here is different from the way I did in my previous comment, so perhaps there are two different concepts that need to be disentangled.
K:
- here’s one thing in this vicinity that i’m sympathetic to: we should have as a criterion on our words, concepts, sentences, thoughts, etc. that they play some role in determining our actions; if some mental element is somehow completely disconnected from our lives, then i’d be suspicious of it. (and things can be connected to action via creating visual anticipations, but also without doing that.)
- that said, i think it can totally be good to be doing some thinking with no clear prior sense about how it could be connected to action (or prediction) — eg doing some crazy higher math can be good, imagining some crazy fictional worlds can be good, games various crazy artists and artistic communities are playing can be good, even crazy stuff religious groups are up to can be good. also, i think (thought-)actions in these crazy domains can themselves be actions one can reasonably be interested in supporting/determining, so this version of entanglement with action is really a very weak criterion
- generally it is useful to be able to “run various crazy programs”, but given this, it seems obvious that not all variables in all useful programs are going to satisfy any such criterion of meaningfulness? like, they can in general just be some arbitrary crazy things (like, imagine some memory bit in my laptop or whatever) playing some arbitrary crazy role in some context, and this is fine
- and similarly for language: we can have some words or sentences playing some useful role without satisfying any strict meaningfulness criterion (beyond maybe just having some relation to actions or anticipations which can be of basically arbitrary form)
- a different point: in human thinking, the way “2+2=4” is related to visual anticipations is very similar to the way “the speed of light is $3 \times 10^{8}$ m/s” is related to visual anticipations
TK:
- Thanks!
- I agree that e.g. imagining fictional worlds like HPMoR can be useful.
- I think I want to expand my notion of “tautological statements” to include statements like “In the HPMoR universe, X happens”. You can also pick any empirical truth “X” and turn it into a tautological one by saying “In our universe, X”. Though I agree it seems a bit weird.
- Basically, mathematics tells you what’s true in all possible worlds, so from mathematics alone you never know in which world you may be in. So if you want to say something that’s true about your world specifically (but not across all possible worlds), you need some observations to pin down what world you’re in.
- I think this distinction is what Eliezer means in his highly advanced epistemology sequence when he uses “logical pinpointing” and “physical pinpointing”.
- You can also have a combination of the two. (I’d say as soon as some physical pinpointing is involved I’d call it an empirical fact.)
- Commented about that. (I actually changed my model slightly): https://www.lesswrong.com/posts/bTsiPnFndZeqTnWpu/mixed-reference-the-great-reductionist-project?commentId=HuE78qSkZJ9MxBC8p
K:
- the imo most important thing in my messages above is the argument against [any criterion of meaningfulness which is like what you’re trying to state] being reasonable
- in brief, because it’s just useful to be allowed to have arbitrary “variables” in “one’s mental circuits”
- just like there’s no such meaningfulness criterion on a bit in your laptop’s memory
- if you want to see from the outside the way the bit is “connected to the world”, one thing you could do is to say that the bit is 0 in worlds which are such-and-such and 1 in worlds which are such-and-such, or, if you have a sense of what the laptop is supposed to be doing, you could say in which worlds the bit “should be 0” and in which worlds the bit “should be 1″, but it’s not like anything like this crazy god’s eye view picture is (or even could explicitly be) present inside the laptop
- our sentences and terms don’t have to have meanings “grounded in visual anticipations”, just like the bit in the laptop doesn’t
- except perhaps in the very weak sense that it should be possible for a sentence to be involved in determining actions (or anticipations) in some potentially arbitrarily remote way
- the following is mostly a side point: one problem with seeing from the inside what your bits (words, sentences) are doing (especially in the context of pushing the frontier of science, math, philosophy, tech, or generally doing anything you don’t know how to do yet, but actually also just basically all the time) is that you need to be open to using your bits in new ways; the context in which you are using your bits usually isn’t clear to you
- btw, this is a sort of minor point but i’m stating it because i’m hoping it might contribute to pushing you out of a broader imo incorrect view: even when one is stating formal mathematical statements, one should be allowed to state sentences with no regard for whether they are tautologies/contradictions (that is, provable/disprovable) or not — ie, one should be allowed to state undecidable sentences, right? eg you should be allowed to state a proof that has the structure “if P, then blabla, so Q; but if not-P, then other-blabla, but then also Q; therefore, Q”, without having to pay any attention to whether P itself is tautological/contradictory or undecidable
- so, if what you want to do with your criterion of meaningfulness involves banning saying sentences which are not “meaningful”, then even in formal math, you should consider non-tautological/contradictory sentences meaningful. (if you don’t want to ban the “meaningless” sentences, then idk what we’re even supposed to be doing with this notion of meaningfulness.)
TK:
- Thx. I definitely agree one should be able to state all mathematical statements (including undecidable ones), and that for proofs you shouldn’t need to pay attention to whether a statement is undecidable or not. (I’m having sorta constructivist tendencies though, where “if P, then blabla, so Q; but if not-P, then other-blabla, but then also Q; therefore, Q” wouldn’t be a valid proof because we don’t assume the law of excluded middle.)
- Ok yeah thx I think the way I previously used “meaningfully” was pretty confused. I guess I don’t really want to rule out any sentences people use.
- I think sth is not meaningful if there’s no connection between a belief to your main belief pool. So “a puffy is a flippo” is perhaps not meaningful to you because those concepts don’t relate to anything else you know? (But that’s a different kind of meaningful from what errors people mostly make.)
K:
- yea. tho then we could involve more sentences about puffies and flippos and start playing some game involving saying/thinking those sentences and then that could be fun/useful/whatever
TK:
- maybe. idk.

Kaarel Mar 31, 2025, 12:36 AM
7 points
0
on: Why do many people who care about AI Safety not clearly endorse PauseAI?
I think it’s plausible that a system which is smarter than humans/humanity (and distinct and separate from humans/humanity) should just never be created, and I’m inside-view almost certain it’d be profoundly bad if such a system were created any time soon. But I think I’ll disagree with like basically anyone on a lot of important stuff around this matter, so it just seems really difficult for anyone to be such that I’d feel like really endorsing them on this matter?^[1] That said, my guess is that PauseAI is net positive, tho I haven’t thought about this that much :)
1. ↩︎
  https://youtu.be/Q3_7HTruMfg

Kaarel Mar 26, 2025, 8:44 PM
5 points
0
in reply to: Steven Byrnes’s comment on: An Advent of Thought
Thank you for the comment!

First, I’d like to clear up a few things:
- I do think that making an “approximate synthetic 2025 human newborn/fetus (mind)” that can be run on a server having 100x usual human thinking speed is almost certainly a finite problem, and one might get there by figuring out what structures are there in a fetus/newborn precisely enough, and it plausibly makes sense to focus particularly on structures which are more relevant to learning. If one were to pull this off, one might then further be able to have these synthetic fetuses grow up quickly into fairly normal humans and have them do stuff which ends the present period of (imo) acute x-risk. (And the development of thought continues after that, I think; I’ll say more that relates to this later.) While I do say in my post that making mind uploads is a finite problem, it might have been good to state also (or more precisely) that this type of thing is finite.
- I certainly think that one can make a finite system such that one can reasonably think that it will start a process that does very much — like, eats the Sun, etc.. Indeed, I think it’s likely that by default humanity would unfortunately start a process that gets the Sun eaten this century. I think it is plausible there will be some people who will be reasonable in predicting pretty strongly that that particular process will get the Sun eaten. I think various claims about humans understanding some stuff about that process are less clear, though there is surely some hypothetical entity that could pretty deeply understand the development of that process up to the point where it eats the Sun.
- Some things in my notes were written mostly with an [agent foundations]y interlocutor in mind, and I’m realizing now that some of these things could also be read as if I had some different interlocutor in mind, and that some points probably seem more incongruous if read this way.
I’ll now proceed to potential disagreements.

But there’s something else, which is a very finite legible learning algorithm that can automatically find all those things—the object-level stuff and the thinking strategies at all levels. The genome builds such an algorithm into the human brain. And it seems to work! I don’t think there’s any math that is forever beyond humans, or if it is, it would be for humdrum reasons like “not enough neurons to hold that much complexity in your head at once”.

Some ways I disagree or think this is/involves a bad framing:
- If we focus on math and try to ask some concrete question, instead of asking stuff like “can the system eventually prove anything?”, I think it is much more appropriate to ask stuff like “how quickly can the system prove stuff?”. Like, brute-force searching all strings for being a proof of a particular statement can eventually prove any provable statement, but we obviously wouldn’t want to say that this brute-force searcher is “generally intelligent”. Very relatedly, I think that “is there any math which is technically beyond a human?” is not a good question to be asking here.
- The blind idiot god that pretty much cannot even invent wheels (ie evolution) obviously did not put anything approaching the Ultimate Formula for getting far in math (or for doing anything complicated, really) inside humans (even after conditioning on specification complexity and computational resources or whatever), and especially not in an “unfolded form”^[1], right? Any rich endeavor is done by present humans in a profoundly stupid way, right?^[2] Humanity sorta manages to do math, but this seems like a very weak reason to think that [humans have]/[humanity has] anything remotely approaching an “ultimate learning algorithm” for doing math?^[3]
- The structures in a newborn [that make it so that in the right context the newborn grows into a person who (say) pushes the frontier of human understanding forward] and [which participate in them pushing the frontier of human understanding forward] are probably already really complicated, right? Like, there’s already a great variety of “ideas” involved in the “learning-relevant structures” of a fetus?
- I think that the framing that there is a given fixed “learning algorithm” in a newborn, such that if one knew it, one would be most of the way there to understanding human learning, is unfortunate. (Well, this comes with the caveat that it depends on what one wants from this “understanding of human learning” — e.g., it is probably fine to think this if one only wants to use this understanding to make a synthetic newborn.) In brief, I’d say “gaining thinking-components is a rich thing, much like gaining technologies more generally; our ability to gain thinking-components is developing, just like our ability to gain technologies”, and then I’d point one to Note 3 and Note 4 for more on this.
- I want to say more in response to this view/framing that some sort of “human learning algorithm” is already there in a newborn, even in the context of just the learning that a single individual human is doing. Like, a human is also importantly gaining components/methods/ideas for learning, right? For example, language is centrally involved in human learning, and language isn’t there in a fetus (though there are things in a newborn which create a capacity for gaining language, yes). I feel like you might want to say “who cares — there is a preserved learning algorithm in the brain of a fetus/newborn anyway”. And while I agree that there are very important things in the brain which are centrally involved in learning and which are fairly unchanged during development, I don’t understand what [the special significance of these over various things gained later] is which makes it reasonable to say that a human has a given fixed “learning algorithm”. An analogy: Someone could try to explain structure-gaining by telling me “take a random init of a universe with such and such laws (and look along a random branch of the wavefunction^[4]) — in there, you will probably eventually see a lot of structures being created” — let’s assume that this is set up such that one in fact probably gets atoms and galaxies and solar systems and life and primitive entities doing math and reflecting (imo etc.). But this is obviously a highly unsatisfying “explanation” of structure-gaining! I wanted to know why/how protons and atoms and molecules form and why/how galaxies and stars and black holes form, etc.. I wanted to know about evolution, and about how primitive entities inventing/discovering mathematical concepts could work, and imo many other things! Really, this didn’t do very much beyond just telling me “just consider all possible universes — somewhere in there, structures occur”! Like, yes, I’ve been given a context in which structure-gaining happens, but this does very little to help me make sense of structure-gaining. I’d guess that knowing the “primordial human learning algorithm” which is there in a fetus is significantly more like knowing the laws of physics than your comment makes it out to be. If it’s not like that, I would like to understand why it’s not like that — I’d like to understand why a fetus’s learning-structures really deserve to be considered the “human learning algorithm”, as opposed to being seen as just providing a context in which wild structure-gaining can occur and playing some important role in this wild structure-gaining (for now).
- to conclude: It currently seems unlikely to me that knowing a newborn’s “primordial learning algorithm” would get me close to understanding human learning — in particular, it seems unlikely that it would get me close understanding how humanity gains scientific/mathematical/philosophical understanding. Also, it seems really unlikely that knowing this “primordial learning algorithm” would get me close to understanding learning/technology-making/mathematical-understanding-gaining in general.^[5]
1. ↩︎
  like, such that it is already there in a fetus/newborn and doesn’t have to be gained/built
2. ↩︎
  I think present humans have much more for doing math than what is “directly given” by evolution to present fetuses, but still.
3. ↩︎
  One attempt to counter this: “but humans could reprogram into basically anything, including whatever better system for doing math there is!”. But conditional on this working out, the appeal of the claim that fetuses already have a load-bearing fixed “learning algorithm” is also defeated, so this counterargument wouldn’t actually work in the present context even if this claim were true.
4. ↩︎
  let’s assume this makes sense
5. ↩︎
  That said, I could see an argument for a good chunk of the learning that most current humans are doing being pretty close to gaining thinking-structures which other people already have, from other people that already have them, and there is definitely something finite in this vicinity — like, some kind of pure copying should be finite (though the things humans are doing in this vicinity are of course more complicated than pure copying, there are complications with making sense of “pure copying” in this context, and also humans suck immensely (compared to what’s possible) even at “pure copying”).

Kaarel Mar 18, 2025, 8:23 AM
1 point
0
in reply to: Jonas Hallgren’s comment on: An Advent of Thought
Thank you for your comment!

What you’re saying seems more galaxy-brained than what I was saying in my notes, and I’m probably not understanding it well. Maybe I’ll try to just briefly (re)state some of my claims that seem most relevant to what you’re saying here (with not much justification for my claims provided in my present comment, but there’s some in the post), and then if it looks to you like I’m missing your point, feel very free to tell me that and I can then put some additional effort into understanding you.
- So, first, math is this richly infinite thing that will never be mostly done.
- If one is a certain kind of guy doing alignment, one might hope that one could understand how e.g. mathematical thinking works (or could work), and then make like an explicit math AI one can understand (one would probably really want this for science or for doing stuff in general^[1], but a fortiori one would need to be able to do this for math).^[2]
- But oops, this is very cursed, because thinking is an infinitely rich thing, like math!
- I think a core idea here is that thinking is a technological thing. Like, one aim of notes 1–6 (and especially 3 and 4) is to “reprogram” the reader into thinking this way about thinking. That is, the point is to reprogram the reader away from sth like “Oh, how does thinking, the definite thing, work? Yea, this is an interesting puzzle that we haven’t quite cracked yet. You probably have to, like, combine logical deduction with some probability stuff or something, and then like also the right decision theory (which still requires some work but we’re getting there), and then maybe a few other components that we’re missing, but bro we will totally get there with a few ideas about how to add search heuristics, or once we’ve figured out a few more details about how abstraction works, or something.”
- Like, a core intuition is to think of thinking like one would think of, like, the totality of humanity’s activities, or about human technology. There’s a great deal going on! It’s a developing sort of thing! It’s the sort of thing where you need/want to have genuinely new inventions! There is a rich variety of useful thinking-structures, just like there is a rich variety of useful technological devices/components, just like there is a rich variety of mathematical things!
- Given this, thinking starts to look a lot like math — in particular, the endeavor to understand thinking will probably always be mostly unfinished. It’s the sort of thing that calls for an infinite library of textbooks to be written.
- In alignment, we’re faced with an infinitely rich domain — of ways to think, or technologies/components/ideas for thinking, or something. This infinitely rich domain again calls for textbooks to keep being written as one proceeds.
- Also, the thing/thinker/thought writing these textbooks will itself need to be rich and developing as well, just like the math AI will need to be rich and developing.
- Generally, you can go meta more times, but on each step, you’ll just be asking “how do I think about this infinitely rich domain?”, answering which will again be an infinite endeavor.
- You could also try to make sense of climbing to higher infinite ordinal levels, I guess?
(* Also, there’s something further to be said also about how [[doing math] and [thinking about how one should do math]] are not that separate.)

I’m at like inside-view p=0.93 that the above presents the right vibe to have about thinking (like, maybe genuinely about its potential development forever, but if it’s like technically only the right vibe wrt the next $10^{12}$ years of thinking (at a 2024 rate) or something, then I’m still going to count that as thinking having this infinitary vibe for our purposes).^[3]

However, the question about whether one can in principle make a math AI that is in some sense explicit/understandable anyway (that in fact proves impressive theorems with a non-galactic amount of compute) is less clear. Making progress on this question might require us to clarify what we want to mean by “explicit/understandable”. We could get criteria on this notion from thinking through what we want from it in the context of making an explicit/understandable AI that makes mind uploads (and “does nothing else”). I say some more stuff about this question in 4.4.
1. ↩︎
  if one is an imo complete lunatic :), one is hopeful about getting this so that one can make an AI sovereign with “the right utility function” that “makes there be a good future spacetime block”; if one is an imo less complete lunatic :), one is hopeful about getting this so that one can make mind uploads and have the mind uploads take over the world or something
2. ↩︎
  to clarify: I actually tend to like researchers with this property much more than I like basically any other “researchers doing AI alignment” (even though researchers with this property are imo engaged in a contemporary form of alchemy), and I can feel the pull of this kind of direction pretty strongly myself (also, even if the direction is confused, it still seems like an excellent thing to work on to understand stuff better). I’m criticizing researchers with this property not because I consider them particularly confused/wrong compared to others, but in part because I instead consider them sufficiently reasonable/right to be worth engaging with (and because I wanted to think through these questions for myself)!
3. ↩︎
  I’m saying this because you ask me about my certainty in something vaguely like this — but I’m aware I might be answering the wrong question here. Feel free to try to clarify the question if so.

Kaarel

a chat with Towards_Keeperhood on what it takes for sentences/​phrases/​words to be meaningful

a chat with Towards_Keeperhood on what it takes for sentences/phrases/words to be meaningful