TsviBT

Karma: 7,317

TsviBT Jun 16, 2025, 3:54 AM
6 points
4
in reply to: Daniel Tan’s comment on: Mech interp is not pre-paradigmatic
No we need actual words for concepts. It’s important to have specifically words.

Some reprogenetics-related projects you could help with

TsviBTJun 15, 2025, 8:25 PM

68 points

0 comments4 min readLW link

TsviBT Jun 15, 2025, 4:13 AM
LW: 2 AF: 1
2
AF
on: Mech interp is not pre-paradigmatic

So, as a field, we don’t have to be happy with the dominant paradigm. But just because we’re not happy with it doesn’t mean it’s not there.

Um, ok fine, so what alternative term do you propose to replace “pre-paradigmatic” as it is currently used, to indicate that there’s no remotely satisfactory paradigm in which to get going on the parts of the field-to-be that really matter?

TsviBT Jun 15, 2025, 1:17 AM
13 points
0
on: TsviBT’s Shortform
In this interview, at the linked time: https://www.youtube.com/watch?v=HUkBz-cdB-k&t=847s

Terence Tao describes the notion of an “obstruction” in math research. I think part of the reason that AGI alignment is in shambles is that we haven’t integrated this idea enough. In other words, a lot of researchers work on stuff that is sort-of known to not be able to address the hard problems.

(I give some obstruction-ish things here: https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html)

TsviBT Jun 9, 2025, 6:07 AM
8 points
0
in reply to: Kabir Kumar’s comment on: Kabir Kumar’s Shortform
The actual hard parts? Math probably doesn’t help much directly, unfortunately. Mathematical thinking is good. You’ll have to learn how to think in novel ways, so there’s not even a vector anyone can point you in, except for pointers with a whole lot of “dereference not included” like “figure out how to understand the fundamental forces involved in what actually determines what a mind ends up trying to do long term” (https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html).

Some of the problems: https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html A meta-philosophy discussion of what might work: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html

TsviBT Jun 8, 2025, 3:54 AM
2 points
0
in reply to: Nina Panickssery’s comment on: Nina Panickssery’s Shortform
I appreciate the pursuit of non-strawman understandings of misgivings around reprogenetics, and the pursuit of addressing them.

I don’t feel I understand the people who talk about embryo selection as “killing embryos” or “choosing who lives and dies”, but I want to and have tried, so I’ll throw some thoughts into the mix.

First: Maybe take a look at: https://www.thenewatlantis.com/publications/the-anti-theology-of-the-body

Hart, IIUC, argues that wanting to choose who will live and who won’t means you’re evil and therefore shouldn’t be making such choices. I think his argument is ultimately stupid, so maybe I still don’t get it. But anyway, I think it’s an importantly different sort of argument than the two you present. It’s an indictment of the character of the choosers.

Second: When I tried to empathize with “life/soul starts at conception”, what I got was:
- We want a simple boundary…
  - … for political purposes, to prevent…
    child sacrifice (which could make sense given the cults around the time of the birth of Christianity?).
    killing mid-term fetuses, which might actually for real start to have souls.
  - … for social purposes, because it causes damage to ….
    the would-be parents’s souls to abort the thing which they do, or should, think of as having a soul.
    the social norm / consensus / coordination around not killing things that people do or should orient towards as though they have souls.
- The pope said so. (...But then I’d like to understand why the pope said so, which would take more research.) (Something I said to a twitter-famous Catholic somehow caused him to seriously consider that, since Yermiahu says that god says “Before I formed you in the womb I knew you...”, maybe it’s ok to discard embryos before implantation...)
- (My invented explanation:) Souls are transpersonal. They are a distributed computation between the child, the parents, the village, society at large, and humanity throughout all time (god). As an embryo grows, the computation is, gradually, “handed off to / centralized in” the physical locus of the child. But already upon conception, the parents are oriented towards the future existence of the child, and are computing their part of the child’s soul—which is most of what has currently manifested of the child’s soul. In this way, we get:
  - From a certain perspective:
    It reflects poorly on would-be parents who decide to abort.
    It makes sense for the state to get involved to prevent abortion. (I don’t agree with this, but hear me out:)
    The perspective is one which does not acknowledge the possibility of would-be parents not mentally and socially orienting to a pregnancy in the same way that parents orient when they are intending to have children, or at least open to it and ready to get ready for it.
    ...Which is ultimately stupid of course, because that is a possibility. So maybe this is still a strawman.
    Well, maybe the perspective is that it’s possible but bad, which is at least usefully a different claim.
  - Within my invented explanation, the “continuous distributed metaphysics of the origins of souls”, it is indeed the case that the soul starts at conception—BUT in fact it’s fine to swap embryos! It’s actually a strange biodeterminism to say that this clump of cells or that, or this genome or that, makes the person. A soul is not a clump of cells or a genome! The soul is the niche that the parents, and the village, have already begun constructing for the child; and, a little bit, the soul is the structure of all humanity (e.g. the heritage of concepts and language; the protection of rights; etc.).

TsviBT Jun 2, 2025, 12:29 AM
2 points
1
in reply to: P.G. Sundling’s comment on: Overview of strong human intelligence amplification methods

I have personally been harmed by antibiotics (Cipro) and suffered an astonishing array of symptoms.

That sucks, yeah. So, I totally buy something like “in many cases, there’s some medical intervention (such as fixing something about the microbiome) that would increase that person’s effective cognitive capacity by quite a lot”. As really simple, broad example, getting 7 hrs sleep / night vs 4 hrs should be a very big boost for almost everyone.

The question I’m asking is “how can we get lots of super brilliant geniuses (of whatever flavor—artistic, philosophical, scientific, mathematical, political, empathetic, etc etc)”? Microbiomes might be important to get reasonably correct, in that if you get them wrong then they have a really bad effect, but I highly doubt you can take a normal person and make them a super brilliant person by tweaking their microbiome (at least in any reasonably normal / feasible way).

TsviBT Jun 1, 2025, 11:14 PM
8 points
0
in reply to: Alex_Altair’s comment on: Alex_Altair’s Shortform
I find that the type of thing greatly affects how I want to engage with it. I’ll just illustrate with a few extremal points:
- Philosophy: I’m almost entirely here to think, not to hear their thoughts. I’ll skip whole paragraphs or pages if they’re doing throat clearing. Or I’ll reread 1 paragraph several times, slowly, with 10 minute pace+think in between each time.
- History: Unless I’m especially trusting of the analysis, or the analysis is exceptionally conceptually rich, I’m mainly here for the facts + narrative that makes the facts fit into a story I can imagine. Best is audiobook + high focus, maybe 1.3x -- 2.something x, depending on how dense / how familiar I already am. I find that IF I’m going linearly, there’s a small gain to having the words turned into spoken language for me, and to keep going without effort. This benefit is swamped by the cost of not being able to pause, skip back, skip around, if that’s what I want to do.
- Math / science: Similar to philosophy, though with much more variation in how much I’m trying to think vs get info.
- Investigating a topic, reading papers: I skip around very aggressively—usually there’s just a couple sentences that I need to see, somewhere in the paper, in order to decide whether the paper is relevant at all, or to decide which citation to follow. Here I have to consciously firmly hold the intention to investigate the thing I’m investigating, or else I’ll get distracted trying to read the paper (incorrect!), and probably then get bored.

TsviBT May 31, 2025, 7:02 PM
2 points
0
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?

Yes. The special language is supposed to have the property that Ak can automatically learn if Ak+1 plans good, bad, or unnecessary actions. An can’t be arbitrarily smarter than humans, but it’s a general intelligence which doesn’t imitate humans and can know stuff humans don’t know.

So to my mind, this scheme is at significant risk of playing a shell game with “how the AIs collectively use novel structures but in a way that is answerable to us / our values”. You’re saying that the simple AI can tell if the more complex AI’s plans are good, bad, or unnecessary—but also the latter “can know stuff humans don’t know”. How?

In other words, I’m saying that making it so that

the AI generates concepts in a special language

but also the AI is actually useful at all, is almost just a restatement of the whole alignment problem.

TsviBT May 30, 2025, 9:54 PM
10 points
2
on: Is Building Good Note-Taking Software an AGI-Complete Problem?
1. All note-taking systems hitherto have failed for a simple reason: they do not ask thinking what it needs. (I think you appreciate this already, just restating.) https://www.lesswrong.com/posts/CoqFpaorNHsWxRzvz/what-comes-after-roam-s-renaissance?commentId=CNK44LqKyh2EQZpJm
2. I agree with your point that we should be looking to the human as the starting point. In fact, I think this means we should be asking for MENTAL tools for thinking FIRST. Maybe possibly LATER we could use software to help thinking, if it asks for it.
3. The mental tool we should be looking at is LEXICOGENESIS. I’ve written about this at length: “The possible shared Craft of deliberate Lexicogenesis” But to summarize for this context: if we create resources that improve our lexicogenetic abilities (such as a more productive morphemicon, augmented grammars, augmented notation, or skill with making words (thinking of metaphors, using the morphemicon, clarifying / factoring ideas to put words to, etc.)), then we will be better able to think at the edge in the language of thinking at the edge.

TsviBT May 30, 2025, 9:19 PM
30 points
15
on: TsviBT’s Shortform
An issue with long-form and asynchronous discourse is wasted motion. Without shared assumptions, the logic and info that locutor 1 adduces is less relevant to locutor 2 than to locutor 1. And, that effect becomes more pronounced as locutor 1 goes down a path of reasoning, constructing more context that locutor 2 doesn’t share. (OTOH, long-form is better in terms of individual thinking.)

TsviBT May 29, 2025, 5:55 PM
4 points
0
in reply to: MalcolmMcLeod’s comment on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
It was briefly in the 300s overall, and 1 or 2 in a few subcategory thingies.

TsviBT May 28, 2025, 4:39 PM
3 points
0
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?
[call turns out to be maybe logistically inconvenient]

It’s OK if a person’s mental state changes because they notice a pink car (“human object recognition” is an easier to optimize/comprehend process). It’s not OK if a person’s mental state changes because the pink car has weird subliminal effects on the human psyche (“weird subliminal effects on the human psyche” is a harder to optimize/comprehend process).

So, somehow you’re able to know when an AI is exerting optimization power in “a way that flows through” some specific concepts? I think this is pretty difficult; see the fraughtness of inexplicitness or more narrowly the conceptual Doppelgänger problem.

It’s extra difficult if you’re not able to use the concepts you’re trying to disallow, in order to disallow them—and it sounds like that’s what you’re trying to do (you’re trying to “automatically” disallow them, presumably without the use of an AI that does understand them).

You say this:

But I don’t get if, or why, you think that adds up to anything like the above.

Anyway, is the following basically what you’re proposing?

Humans can check goodness of $A_{0}$ because $A_{0}$ is only able to think using stuff that humans are quite familiar with. Then $A_{k}$ is able to oversee $A_{k + 1}$ because… (I don’t get why; something about mapping primitives, and deception not being possible for some reason?) Then $A_{N}$ is really smart and understands stuff that humans don’t understand, but is overseen by a chain that ends in a good AI, $A_{0}$ .

TsviBT May 27, 2025, 9:42 AM
8 points
4
in reply to: cubefox’s comment on: Wei Dai’s Shortform
I don’t know whether this applies to you

I’m not sure. I did put in some effort to survey various strands of philosophy related to axiology, but not much effort. E.g. looked at some writings in the vein of Anscombe’s study of intention; tried to read D+G because maybe “machines” is the sort of thing I’m asking about (was not useful to me lol); have read some Heidegger; some Nietzsche; some more obscure things like “Care Crosses the River” by Blumenberg; the basics of the “analytical” stuff LWers know (including doing some of my own research on decision theory); etc etc. But in short, no, none of it even addresses the question—and the failure is the sort of failure that was supposed to have its coarsest outlines brought to light by genuinely Socratic questioning, which is why I call it “pre-Socratic”, not to say that “no one since Socrates has billed themselves as talking about something related to values or something”.

I think even communicating the question would take a lot of work, which as I said is part of the problem. A couple hints:
- https://www.lesswrong.com/posts/NqsNYsyoA2YSbb3py/fundamental-question-what-determines-a-mind-s-effects (I think if you read this it will seem incredibly boringly obvious and trivial, and yet, literally no one addresses it! Some people sort of try, but fail so badly that it can’t count as progress. Closest would be some bits of theology, maybe? Not sure.)
- https://www.lesswrong.com/posts/p7mMJvwDbuvo4K7NE/telopheme-telophore-and-telotect (I think this distinction is mostly a failed attempt to carve things, but the question that it fails to answer is related to the important question of values.)
- You should think of the question of values as being more like “what is the driving engine” rather than “what are the rules” or “what are the outcomes” or “how to make decisions” etc.

TsviBT May 27, 2025, 1:00 AM
7 points
3
in reply to: cubefox’s comment on: Wei Dai’s Shortform
“A lot of progress”.… well, reality doesn’t grade on a curve. Surely someone has said something about something, yes, but have we said enough about what matters? Not even close. If you don’t know how inadequate our understanding of values is I can’t convince you in a comment, but one way to find out would be to try to solve alignment. E.g. see https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html

TsviBT May 26, 2025, 5:40 PM
7 points
3
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
It seems to me that values have been a main focus of philosophy for a long time I’m curious about the rationale behind your suggestion.

Specifically the question of “what values are” I don’t think has been addressed (I’ve looked around some, but certainly not thoroughly). A key problem with previous philosophy is that values are extreme in how much they require some sort of mental context (https://www.lesswrong.com/posts/HJ4EHPG5qPbbbk5nK/gemini-modeling). Previous philosophy (that I’m aware of) largely takes the mental context for granted, or only highlights the parts of it that are called into question, or briefly touches on it. This is pretty reasonable if you’re a human talking to humans, because you do probably share most of that mental context. But it fails on two sorts of cases:
1. trying to think about or grow/construct/shape really alien minds, like AIs;
2. trying to exert human values in a way that is good but unnatural (think for example of governments, teams, “superhuman devotion to a personal mission”, etc.).
The latter, 2., might have, given more progress, helped us be wiser.

My comment was responding to

it was a bad idea to invent things like logic, mathematical proofs, and scientific methodologies, because it permanently accelerated the wrong things (scientific and technological progress) while giving philosophy only a temporary boost (by empowering the groups that invented those things, which had better than average philosophical competence, to spread their culture/influence).

So I’m saying, in retrospect on the 2.5 millennia of philosophy, it plausibly would have been better to have an “organology, physiology, medicine, and medical enhancement” of values. To say it a different way, we should have been building the conceptual and introspective foundations that would have provided the tools with which we might have been able to become much wiser than is accessible to the lone investigators who periodically arise, try to hack their way a small ways up the mountain, and then die, leaving mostly only superficial transmissions.

whereas metaphilosophy has received much less attention.

I would agree pretty strongly with some version of “metaphilosophy is potentially a very underserved investment opportunity”, though we don’t necessarily agree (because of having “very different tastes” about what metaphilosophy should be, amounting to not even talking about the same thing). I have ranted several times to friends about how philosophy (by which I mean metaphilosophy—under one description, something like “recursive communal yak-shaving aimed at the (human-)canonical”) has barely ever been tried, etc.

TsviBT May 26, 2025, 3:33 PM
9 points
12
in reply to: Noosphere89’s comment on: Wei Dai’s Shortform

I want to flag that the position that we could have understood values/philosophy without knowing about math/logic is a fictional world/fabricated option.

Maybe but I don’t believe that you know this. Lots of important concepts want to be gotten at by routes that don’t use much math or use quite different math from “math to understand computers” or “math to formalize epistemology”. Darwin didn’t need much math to get lots of the core structure of evolution by natural selection on random mutation.

TsviBT May 26, 2025, 3:31 PM
2 points
0
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?

the current human values are enough to express corrigibility

Huh? Not sure I understand this. How is this the case?

(I may have to tap out, because busy. At some point we could have a call to chat—might be much easier to communicate in that context. I think we have several background disagreements, so that I don’t find it easy to interpret your statements.)

TsviBT May 26, 2025, 3:09 PM
5 points
0
in reply to: cubefox’s comment on: Wei Dai’s Shortform

Can you make the problem statement more precise

No, that’s part of the problem. There’s pretheoretic material as some of a starting point here:

https://www.lesswrong.com/posts/YLRPhvgN4uZ6LCLxw/human-wanting

Whatever those things are, you’d want to understand the context that makes them what they are:

https://www.lesswrong.com/posts/HJ4EHPG5qPbbbk5nK/gemini-modeling

And refactor the big blob into lots of better concepts, which would probably require a larger investigation and conceptual refactoring:

https://www.lesswrong.com/posts/TNQKFoWhAkLCB4Kt7/a-hermeneutic-net-for-agency

In particular so that we understand how “values” can be stable (https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=koeti9ygXB9wPLnnF) and can incorporate novel concepts / deal with novel domains (https://www.lesswrong.com/posts/CBHpzpzJy98idiSGs/do-humans-derive-values-from-fictitious-imputed-coherence) and eventually address the stuff here: https://www.lesswrong.com/posts/ASZco85chGouu2LKk/the-fraught-voyage-of-aligned-novelty

TsviBT May 25, 2025, 11:21 AM
9 points
2
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?
The issue is that the following is likely true according to me, though controversial:

The type of mind that might kill all humans has to do a bunch of truly novel thinking.

To have our values interface appropriately with with this novel thinking patterns in the AI, including through corrigibility, I think we have to work with “values” that are the sort of thing that can refer / be preserved / be transferred across “ontological” changes.

Quoting from https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html:

Rasha: “This will discover variables that you know how to evaluate, like where the cheese is in the maze—you have access to the ground truth against which you can compare a reporter-system’s attempt to read off the position of the cheese from the AI’s internals. But this won’t extend to variables that you don’t know how to evaluate. So this approach to honesty won’t solve the part of alignment where, at some point, some mind has to interface with ideas that are novel and alien to humanity and direct the power of those ideas toward ends that humans like.”

TsviBT

Some re­pro­ge­net­ics-re­lated pro­jects you could help with

Some reprogenetics-related projects you could help with