TsviBT comments on Koan: divining alien datastructures from RAM activations

TsviBT 23 Jul 2024 12:44 UTC
8 points
0
Hm. I think my statement does firmly include the linked paper (at least the first half of it, insofar as I skimmed it).

It’s becoming clear that a lot of my statements have background mindsets that would take more substantial focused work to exposit. I’ll make some gestural comments.
- When I say “not a good way...” I mean something like “is not among the top X elements of a portfolio aimed at solving this in 30 years (but may very well be among the top X elements of a portfolio aimed at solving this in 300 years)”.
- Streetlighting, in a very broad sense that encompasses most or maybe all of foregoing science, is a very good strategy for making scientific progress—maybe the only strategy known to work. But it seems to be too slow. So I’m not assuming that “good” is about comparisons between different streetlights; if I were, then I’d consider lots of linguistic investigations to be “good”.
- In fairly wide generality, I’m suspicious of legible phenomena.
  - (This may sound like an extreme statement; yes, I’m making a pretty extreme version of the statement.)
  - The reason is like this: “legible” means something like “readily relates to many things, and to standard/common things”. If there’s a core thing which is alien to your understanding, the legible emanations from that core are almost necessarily somewhat remote from the core. The emanations can be on a path from here to there, but they also contain a lot of irrelevant stuff, and can maybe in principle be circumvented (by doing math-like reasoning), so to speak.
  - So looking at the bytecode of a compiled python program does give you some access to the concepts involved in the python program itself, but those concepts are refracted through the compiler, so what you’re seeing in the bytecode has a lot of structure that’s interesting and useful and relevant to thinking about programs more generally, but is not really specifically relevant to the concepts involved in this specific python program.
- Concretely in the case of linguistics, there’s an upstream core which is something like “internal automatic conceptual engineering to serve life tasks and play tasks”.
  - ((This pointer is not supposed to, by itself, distinguish the referent from other things that sound like they fit the pointer taken as a description; e.g., fine, you can squint and reasonably say that some computer RL thing is doing “internal automatic...” but I claim the human thing is different and more powerful, and I’m just trying to point at that as distinct from speech.))
  - That upstream core has emanations / compilations / manifestations in speech, writing, internal monologue. The emanations have lots of structure. Some of that structure is actually relevant to the core. A lot of that structure is not very relevant, but is instead mostly about the collision of the core dynamics with other constraints.
  - Phonotactics is interesting, but even though it can be applied to describe how morphemes interact in the arena of speech, I don’t think we should expect it to tell us much about morphemes; the additional complexity is about sounds and ears and mouths, and not about morphemes.
  - A general theory about how the cognitive representations of “assassin” and “assassinate” overlap and disoverlap is interesting, but even though it can be applied to describe how ideas interact in the arena of word-production, I don’t think we should expect it tell us much about ideas; the additional complexity is about fast parallel datastructures, and not about ideas.
  - In other words, all the “core of how minds work” is hidden somewhere deep inside whatever [CAT] refers to.
What links here?
- Abstract advice to researchers tackling the difficult core problems of AGI alignment by TsviBT (22 Nov 2025 0:53 UTC; 129 points)
- TsviBT's comment on The Field of AI Alignment: A Postmortem, and What To Do About It by johnswentworth (26 Dec 2024 20:27 UTC; 26 points)
- Steven Byrnes 23 Jul 2024 14:06 UTC
  2 points
  0
  Parent
  Thanks!
  One thing I would say is: if you have a (correct) theoretical framework, it should straightforwardly illuminate tons of diverse phenomena, but it’s very much harder to go backwards from the “tons of diverse phenomena” to the theoretical framework. E.g. any competent scientist who understands Evolution can apply it to explain patterns in finch beaks, but it took Charles Darwin to look at patterns in finch beaks and come up with the idea of Evolution.
  Or in my own case, for example, I spent a day in 2021 looking into schizophrenia, but I didn’t know what to make of it, so I gave up. Then I tried again for a day in 2022, with a better theoretical framework under my belt, and this time I found that it slotted right into my then-current theoretical framework. And at the end of that day, I not only felt like I understood schizophrenia much better, but also my theoretical framework itself came out more enriched and detailed. And I iterated again in 2023, again simultaneously improving my understanding of schizophrenia and enriching my theoretical framework.
  Anyway, if the “tons of diverse phenomena” are datapoints, and we’re in the middle of trying to come up with a theoretical framework that can hopefully illuminate all those datapoints, then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework), at any particular point in this process. The “schizophrenia” datapoint was totally unhelpful to me in 2021, but helpful to me in 2022. The “precession of Mercury” datapoint would not have helped Einstein when he was first brainstorming general relativity in 1907, but was presumably moderately helpful when he was thinking through the consequences of his prototype theory a few years later.
  The particular phenomena / datapoints that are most useful for brainstorming the underlying theory (privileging the hypothesis), at any given point in the process, need not be the most famous and well-studied phenomena / datapoints. Einstein wrung much more insight out of the random-seeming datapoint “a uniform gravity field seems an awful lot like uniform acceleration” than out of any of the datapoints that would have been salient to a lesser gravity physicist, e.g. Newton’s laws or the shape of the galaxy or the Mercury precession. In my own case, there are random experimental neuroscience results (or everyday observations) that I see as profoundly revealing of deep truths, but which would not be particularly central or important from the perspective of other theoretical neuroscientists.
  But, I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints. (Unless of course you’re also reading and internalizing crappy literature theorizing about those phenomena, and it’s filling your mind with garbage ideas that get in the way of constructing a better theory.) For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
  - TsviBT 23 Jul 2024 15:40 UTC
    6 points
    2
    Parent
    Thanks, this is helpful to me.
    
    An example of something: do LLMs have real understanding, in the way humans do? There’s a bunch of legible stuff that people would naturally pay attention to as datapoints associated with whatever humans do that’s called “real understanding”. E.g. being able to produce grammatical sentences, being able to answer a wide range of related questions correctly, writing a poem with s-initial words, etc. People might have even considered those datapoints dispositive for real understanding. And now LLMs can do those. … Now, according to me LLMs don’t have much real understanding, in the relevant sense or in the sense humans do. But it’s much harder to point at clear, legible benchmarks that show that LLMs don’t really understand much, compared to previous ML systems.
    
    then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework),
    
    The “as brainstorming aids for developing the underlying theoretical framework” is doing a lot of work there. I’m noticing here that when someone says “we can try to understand XYZ by looking at legible thing ABC”, I often jump to conclusions (usually correctly actually) about the extent to which they are or aren’t trying to push past ABC to get to XYZ with their thinking. A key point of the OP is that some datapoints may be helpful, but they aren’t the main thing determining whether you get to [the understanding you want] quickly or slowly. The main thing is, vaguely, how you’re doing the brainstorming for developing the underlying theoretical framework.
    
    I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints.
    
    I’m not saying all legible data is bad or irrelevant. I like thinking about human behavior, about evolution, about animal behavior; and my own thoughts are my primary data, which isn’t like maximally illegible or something. I’m just saying I’m suspicious of all legible data. Why?
    
    Because there’s more coreward data available. That’s the argument of the OP: you actually do know how to relevantly theorize (e.g., go off and build a computer—which in the background involves theorizing about datastructures).
    
    Because people streetlight, so they’re selecting points for being legible, which cuts against being close to the core of the thing you want to understand.
    
    Because theorizing isn’t only, or even always mainly, about data. It’s also about constructing new ideas. That’s a distinct task; data can be helpful, but there’s no guarantee that reading the book of nature will lead you along such that in the background you construct the ideas you needed.
    
    For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
    
    It’s legible, yeah. They should have it in mind, yeah. But after they’ve thought about it for a while they should notice that the real movers and shakers of the world are weird illegible things like religious belief, governments, progressivism, curiosity, invention, companies, child-rearing, math, resentment, …, which aren’t very relevantly described by the sort of theories people usually come up with when just staring at stuff like cold->sweater, AFAIK.
    - Steven Byrnes 23 Jul 2024 16:39 UTC
      2 points
      0
      Parent
      I don’t think we disagree much if at all.
      I think constructing a good theoretical framework is very hard, so people often do other things instead, and I think you’re using the word “legible” to point to some of those other things.
      I’m emphasizing that those other things are less than completely useless as semi-processed ingredients that can go into the activity of “constructing a good theoretical framework”
      You’re emphasizing that those other things are not themselves the activity of “constructing a good theoretical framework”, and thus can distract from that activity, or give people a false sense of how much progress they’re making.
      I think those are both true.
      The pre-Darwin ecologists were not constructing a good theoretical framework. But they still made Darwin’s job easier, by extracting slightly-deeper patterns for him to explain with his much-deeper theory—concepts like “species” and “tree of life” and “life cycles” and “reproduction” etc. Those concepts were generally described by the wrong underlying gears before Darwin, but they were still contributions, in the sense that they compressed a lot of surface-level observations (Bird A is mating with Bird B, and then Bird B lays eggs, etc.) into a smaller number of things-to-be-explained. I think Darwin would have had a much tougher time if he was starting without the concepts of “finch”, “species”, “parents”, and so on.
      By the same token, if we’re gonna use language as a datapoint for building a good underlying theoretical framework for the deep structure of knowledge and ideas, it’s hard to do that if we start from slightly-deep linguistic patterns (e.g. “morphosyntax”, “sister schemas”)… But it’s very much harder still to do that if we start with a mass of unstructured surface-level observations, like particular utterances.
      I guess your perspective (based on here) is that, for the kinds of things you’re thinking about, people have not been successful even at the easy task of compressing a lot of surface-level observations into a smaller number of slightly-deeper patterns, let alone successful at the the much harder task of coming up with a theoretical framework that can deeply explain those slightly-deeper patterns? And thus you want to wholesale jettison all the previous theorizing? On priors, I think that would be kinda odd. But maybe I’m overstating your radicalism. :)
      - TsviBT 23 Jul 2024 17:12 UTC
        6 points
        2
        Parent
        I mean the main thing I’d say here is that we just are going way too slowly / are not close enough. I’m not sure what counts as “jettisoning”; no reason to totally ignore anything, but in terms of reallocating effort, I guess what I advocate for looks like jettisoning everything. If you go from 0% or 2% of your efforts put toward questioning basic assumptions and theorizing based on introspective inspection and manipulation of thinking, to 50% or 80%, then in some sense you’ve jettisoned everything? Or half-jettisoned it?