Koan: divining alien datastructures from RAM activations
[Metadata: crossposted from https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html.]
Exploring the ruins of an alien civilization, you find what appears to be a working computer——it’s made of plastic and metal, wires connect it to various devices, and you see arrays of capacitors that maintain charged or uncharged states and that sometimes rapidly toggle in response to voltages from connected wires. You can tell that the presumptive RAM is activating in complex but structured patterns, but you don’t know their meanings. What strategies can you use to come to understand what the underlying order is, what algorithm the computer is running, that explains the pattern of RAM activations?
Thanks to Joss Oliver (SPAR) for entertaining a version of this koan. Many of B’s ideas come from Joss.
Real data about minds
Red: If we want to understand how minds work, the only source of real data is our own thinking, starting from the language in which we think.
Blue: That doesn’t seem right. A big alternative source of data is neuroscience. We can directly observe the brain——electrical activations of neurons, the flow of blood, the anatomical structure, the distribution of chemicals——and we can correlate that with behavior. Surely that also tells us about how minds work?
Red: I mostly deny this. To clarify: I deny that neuroscience is a good way to gain a deep understanding of the core structure of mind and thought. It’s not a good way to gain the concepts that we lack.
Blue: Why do you think this? It seems straightforward to expect that science should work on brains, just like it works on anything else. If we study the visible effects of the phenomenon, think of hypotheses to explain those visible effects, and test those hypotheses to find the ones that are true, then we’ll find our way towards more and more predictive hypotheses.
R: That process of investigation would of course work in the very long run. My claim is: that process of investigation is basically trying to solve a problem that’s different from the problem of understanding the core structure of mind. That investigation would eventually work anyway, but mostly as a side-effect. As a rough analogy: if you study soccer in great detail, with a very high standard for predictive accuracy, you’ll eventually be forced to understand quantum mechanics; but that’s really a side-effect, and it doesn’t mean quantum mechanics is very related to soccer or vice versa, and there’s much faster ways of investigating quantum mechanics. As another, closer analogy: if you study a calculator in great detail, and you ask the right sort of questions, then you’ll eventually be led to understand addition, because addition is in some sense a good explanation for why the calculator is the way that it is; but you really have to be asking the right questions, and you could build a highly detailed physics simulation that accurately predicts the intervention-observation behavior of the calculator as a physical object without understanding addition conceptually (well, aside from needing to understand addition for purposes of coding a simulator).
B: What I’m hearing is: There are different domains, like QM and soccer, or electrons in wires vs. the concepts of addition. And if you want to understand one domain, you should try to study it directly. Is that about right?
R: Yeah, that’s an ok summary.
B: Just ok?
R: Your summary talks about where we start our investigation. I’d also want to emphasize the directional pull on our investigation that comes from the questions we’re asking.
B: I see. I don’t think this really applies to neuroscience and minds, though. Like, ok, what we really want is to understand, how did you put it, the core structure of mind. This seems like a good goal and I agree we should keep it in mind, but how does that imply that neuroscience is bad? The brain is really complicated, so we start with what we can observe and we try to explain the lower-level algorithms that the brain is using. Then we try to work “upwards” from there towards the bigger, more complicated algorithms, and eventually we build up a picture of how the whole mind works. Jumping all the way up to the top level of abstraction and trying to explain the whole thing seems a lot harder than starting with the simpler, more concrete, more tractable questions about the data we can actually get. Anyway, if we can’t understand the simpler, smaller things, how could we hope to understand the larger, more complicated things?
R: To elliptically address your last point: if you come to understand general relativity, you can then explain Mercury’s Newtonianly-anomalous orbit, and you can use your explanation to convince others that general relativity is correct. But that doesn’t mean studying Mercury’s orbit is a good way to arrive at general relativity in the first place. There are many directions in ideaspace that could be called “simpler” or “more concrete”, and they’re not all the same.
Now, to your main point: I agree that if you’re doing neuroscience with good taste and holding fast to the goal of understanding mind, then that’s worthwhile. If there weren’t a Steve Byrnes, the world would be wanting for a Steve Byrnes. But I’m saying that trying to read the structure of mind off electrical or anatomical data from a brain is not a very hopeworthy path, and there’s an alternative path that’s more hopeworthy.
B: I still don’t get what you mean by a more direct path that gives more “real data” about minds. Isn’t brain data pretty much the most direct data we have?
RAM divination
R: Here’s another analogy. Suppose that I give you the task of understanding how computer operating systems work. I give you a computer to study, but you’re not allowed to use it normally; you can’t look at the screen, you can’t type into it, nothing, so you can’t read the files or the code or interact with the operating system “at the level of abstraction” that it natively operates in. I don’t even let you read the hard drive. The only thing I give you is noisy, intermittent read access to a random, changing subset of locations in RAM. I think that in this situation, if your approach is to stare at the RAM activations and try to discern patterns, you’ll have a really really hard time coming to grasp high-level concepts about how operating systems work, like multiplexing the execution of multiple programs, laying out the memory and granting appropriate access, and whatever. If I want this to be a little more like an allegory for the situation with mechanistic interpretability, I can even say, you get access to all of RAM, and also to the CPU. How do you learn the core ideas of OSes, given that situation?
B: {thinks for a while} It does seem hard.
R: Let me simplify it a little bit. OSes are big and complicated, but the point I want to make should stand for any phenomenon that you’d need a substantial blob of novel concepts to properly understand. So let’s say that you’re looking at the computer, and secretly, the computer is running a program that uses a hash table. But you don’t know what a hash table is, and all you see is the RAM. The goal is to run an investigation that will reasonably efficiently lead you to understand what a hash table is.
B: {thinks a bit more} I’m not sure this is what you’re looking for, but one idea is to look for correlations between activations and use those to derive bits of causality between latent variables.
R: That’s a solid idea for doing science, and that sort of thing works in a lot of situations. But this situation seems especially ill-suited to that sort of investigation. If we think of Faraday exploring electromagnetism, he’s experimenting with stuff that’s “pretty close to the guts” of the phenomenon he’s trying to get his hands on. He’s roughly mapping out the rough contours and timing of the major lines of force in lots of situations with magnets and with electrical charges and currents. There’s a sense in which the simple equations describing electromagnetism in generality are “encrypted” in the interactions of metal and electricity, but it’s a lesser order of encryption compared to the case of the hash table in RAM. Faraday gets to hold the phenomenon in his hands and turn it over and over until he’s very familiar with its contours.
The RAM activations, on the other hand, present a quite garbled and partial picture of the hash table. Hash tables built with different data with have different patterns of hash collisions; hash tables built at different times on the same computer will be allocated different regions in RAM; and if there’s a hash seed that varies between processes, like in python, the hashes and collision patterns will be totally different even with the same data. And imagine trying to deduce how the hashing algorithm works! An expert with a strong command of hashing and hash tables and other surrounding context might in some cases be able to recover what’s being done inside the computer. But in this hypothetical, you don’t even know what a hash table is! So you have no idea what to look for.
B: Are you basically saying that it’s a really hard science problem?
R: The hope here, with this koan, is to get a better view of possible strategies available to us, when we’re trying to understand something that requires us to come up with lots of new concepts. When I’m talking about how the RAM activations from a hash table are garbled, I’m responding to your instinct to look at the data, the RAM activations, and do science to that data by trying to find patterns which explain incrementally more of the distribution of the data. That sort of science is one sort of possible strategy, but I think there are others.
B: I’m not sure what you could do, other than look at the data. I guess one thing that might help is that I could go through ideas that I already understand, and see if any of those ideas help me explain the data. I don’t know if that counts as cheating, since you said that in the hypothetical I don’t know about hash tables.
R: I do want to specify that you don’t already understand anything that’s very helpfully close to hash tables. But I like your comment. Although this new strategy is in some sense obvious, and I’ll want to rule out that it helps very much, it starts to illustrate that there’s more to say than just “do science to the data”. Science, as we practice it, of course involves searching among ideas you already have, but “stare at the data and see what it makes you think of” and “search through your library of ideas” are relevantly different pieces of advice you can give to yourself. And I think there are more pieces of advice that are relevantly different, and that would help.
Green’s problem
R: Anyway, let’s tweak the hypothetical. Since you do in fact know about hash tables, let’s instead consider that your friend Green is faced with a computer running a mystery algorithm. You can imagine that Green doesn’t know about hash tables at all, and doesn’t know about anything very similar or related. What advice can you give Green to make it more likely that ze will figure out how hash tables work?
B: Hm… What if I tell zer to run simulations of different algorithms to see if they produce data that looks like the RAM activations… But that’s pretty similar to just doing science. I suppose I could tell zer to talk about the problem with other people, and hopefully ze talks to someone who knows about hash tables.
R: That would work in this exact case, but I want this situation to be analogous to the original case: understanding the core structure of minds. No humans and no internet pages currently contain a clear understanding of the core structure of minds. Sort of like with Archimedes’s Chronophone, we want our advice to Green to translate to our own situation, in a way that only makes use of what we have available in our situation——so our advice to Green should in some sense only make use of what Green has in zer situation.
To make this a bit more concrete, here’s a hypothetical that runs parallel to the one with Green: You come across an alien computer. You can tell it’s a computer, because it’s got wires and stuff, but you don’t know anything about what it’s doing. Unknown to you, it’s running a program that uses a certain mystery datastructure. Neither you, nor any other human, knows about that datastructure. It’s not so incredibly complex and subtle that you can’t understand it, but nevertheless you don’t already understand it or anything like it. You can view the RAM, and you’re supposed to come to understand what this computer is doing.
B: {thinks more} I could look for correlations between RAM activations and external events, like current on wires coming in or out of the computer… I could try comparing the RAM data to the data produced by processes I do understand, and think about the differences… I dunno, this all just seems like science. It’s hard to see what would be helpful, but wouldn’t count as science.
R: “Science” is vague. I’m not saying that the answer is something that is definitively not science. But a good answer should be more specific advice than just “do science”. Note that science, specifically as a social protocol, is about legibilization and verification, and doesn’t deal so much with hypothesis-generation; here we’re asking for advice on hypothesis generation, for directing the part of the search process that generates the space to search in. The advice should be specific enough to substantially redirect Green’s search in a way that speeds up Green’s success, and at the same time the advice should be general enough that it translates to your situation with the mystery alien datastructure. I mean, you could just explain the hash table to Green, but that obviously is cheating, and doesn’t translate to the situation with the alien computer or trying to understand minds. You could tell Green to try harder, which is translatable, but isn’t very helpful.
B: I think that makes sense, but I’m still just thinking of general heuristics that seem good, and they’re all basically bits of my understanding of how to do science. I don’t know what other sort of thing there is.
Themes of investigation
R: I think there’s a theme to your recommendations that’s more specific than just science. That theme constitutes paying attention to some areas of the search space and neglecting others. The theme is something like, all your recommendations are in the form of:
(tools for) getting the Idea from the Data.
In other words, the way they direct the search holds tight to the recordings of RAM activations. E.g., “try to explain a subset of the data” or “try to find correlations” or “run simulations”. These are trying to set up a tight feedback loop between [tweaks to your ideas] and [comparison of predictive power against the data]. The way these recommendations shape Green’s search process is by pruning away new conjectural ideas ze creates if they don’t identifiably produce some increase in ability to predict RAM activations. Here’s a different theme, which we’ve discussed:
(tools for) getting the Idea from other Ideas I already have.
The previous recommendation was about Prune; this recommendation is about Babble. It recommends that instead of generating hypothesis-parts using only what my eyes make me think of when I stare at the data, I should also use my preexisting ideas. It modifies the search that Green is running by changing zer distribution over programs, compared to if Green were not using zer preexisting ideas as short wieldy basic elements in composing novel ideas to investigate. It’s like how generating a random program in python is different from generating a random program in python after importing some module. (In theory these differences wash out, but not in practice.) Another piece of advice:
Generate new ideas by brute-force searching over combinations of ideas I already have.
This is almost trivial advice, but we could very plausibly peek in on Green trying to learn about hash tables, and find that ze is actually stuck, and actually isn’t doing anything at all. That process will never succeed, whereas brute force search would eventually succeed. So, like, if an investigator is completely ignoring an avenue or dimension of investigation, pointing it out can have a large effect. Another theme:
(tools for) getting the Idea from other Ideas that other people already have.
Now that we’ve seen a couple regions of the space of possible answers to the hypothetical, I wonder if you can think of strategies on another theme. Another prompt: Suppose you leave Green alone for six months, and when you come back, it turns out ze’s figured out what hash tables are. What do you suppose might have happened that led to zer figuring out hash tables?
B: Let’s see… How about telling Green to take performance-enhancing drugs? That could speed things up.
R: Ha! That’s not what I had in mind, but now we’re, uh, seeing the breadth of the space. Suppose ze takes your advice, and now ze’s thinking 20% more hours per week, but still isn’t really getting anywhere. What now?
To reframe things a bit, I want to point out that in some situations, you know how to make progress on difficult questions in a way that doesn’t rely on either empirical data or on copying from preexisting ideas. Namely, you know how to make conceptual progress on mathematical questions by doing mathematical investigations and coming up with mathematical ideas. How is it that you do that? Why does it work? Why can you have a math question, then do a bunch of activity that is not obviously related to the question, and then somehow that activity has produced understanding which solves the question (like finding a faster graph isomorphism tester using ideas from group theory)?
B: Part of what makes learning math be like that is just practice? I get experience solving easier problems, and that makes me better at math in general, and then I can tackle the harder problems.
R: Nice. So a theme here is:
(tools for) practicing getting Ideas (of a similar or easier caliber/complexity, that you also don’t already know about), e.g. experimenting with different cognitive strategies to see which ones work.
For example, we could practice, in preparation for the task of understanding minds, by trying to understand other phenomena that are complex, require concepts we don’t already have, and only present sensory data that’s garbled and partial.
J-trees
B: Another thing this makes me think of: If you have a tough problem, instead of directly working on it, you can go and build up the quality and quantity of theory in the realm of math generally or in areas related to the tough problem. Then you can maybe spot connections between what you’ve gotten and the original problem. E.g. Poincaré′s conjecture that spheres are characterized by their fundamental groups was solved partly by ideas that come from studying vector fields. Or, you could translate the original problem into the language of your new discoveries, which seems to sometimes help.
R: Excellent. How would that strategy look, translated to the situation with the alien computer and the unknown algorithm?
B: I suppose we could look around for other alien computers, and investigate those, and that way we get more different data?
R: That might help, but let’s specify that there aren’t any other alien computers around.
B: Ok, hm… Then I’m not sure how to translate it. If this is the only alien computer we have, I don’t see how to get another source of data… or how to investigate a “nearby” area that would feed in to understanding the alien computer. Maybe we can look at other alien artifacts that aren’t the computer?
R: That could in theory help, but if the algorithm in question uses substantial novel concepts, you probably won’t find those concepts clearly embodied in a non-computer context. For the hash table example, the closest things I can think of, outside of computers, would be an alphabetical indexing system for the lookup aspect, and randomly sorting objects into groups by drawing lots for the even-distribution aspect; but this doesn’t really explain hash tables. The distance from any non-computer thing seems even further for, say, b-trees, let alone many of the really fancy algorithms we use.
Let me give a slight hint. The hint is this: Although you never do figure out what algorithm is running on the alien computer, it happens to be the case that in the year 3000, the algorithm will be called “J-trees”.
B: …What the hell kind of a hint is that?
R: {smiles trollily}
B: Let’s think step by step. In the year 3000, the algorithm is called “J-trees”. What does that mean? What information can I get from the name of the algorithm? It tells me that the algorithm has something to do with trees… But I’m not right now trying to figure out what the algorithm is, I’m trying to figure out a general strategy for figuring out what the algorithm is. But I already knew the algorithm would be named something, so I haven’t really learned anything. Or, well, it would be called something by the aliens… But if it’s being called “J-trees”, that’s a human name, which means that humans know about the algorithm. Wait, did someone else figure out the alien computer?
R: No.
B: So then how was this algorithm called J-trees? Ok, I think that people just eventually happen to invent the same algorithm. How can I use that fact...
Ah. Here’s my strategy: I sit around and fly kites, and if I wait long enough, someone else will discover the algorithm.
R: … {facepalms}
B: {smiles trollily}
R: Mountain mountain, very clever. Yes that technically works, but it doesn’t really count. I mean, you didn’t speed up the search process.
Parochial abstractions
B: How can I speed it up? I can try to improve humanity’s ability to do science in general, or try to do science myself. But now we’re just where we started.
R: We can ask ourselves: “Why did humanity end up ever discovering J-trees?”
B: Presumably because J-trees are interesting or useful or something.
R: Right.
B: Maybe they’re natural abstractions?
R: To some extent, we can assume.
B: I wonder how can I use the fact that they’re natural abstractions to speed up the search. Actually, it seems like a lot of things should speed up finding natural abstractions. If you’re doing something difficult in a domain, you’ll probably discover the natural abstractions about that domain. So really, doing anything difficult should help?
R: We’re getting pretty close to what I have in mind, but what I have in mind is a bit more specific. You could tell anyone doing anything “do lots of difficult things” to make them find natural abstractions, just like you could tell anyone doing anything “take performance enhancing drugs” or “try harder” or “practice solving hard problems” or “consult humanity’s knowledge” or “here’s some tips for doing science better”. To be fair, I did ask for some strategy that would translate between Green’s hash table situation, your alien computer situation, and our situation with understanding minds. But the thing I have in mind is a little bit “cheating”, in the sense that it involves something a bit more specific to the problem. Obviously it can’t be “here’s an explanation of what hash tables are: blah blah hash collisions blah blah amortized”, because that’s cheating too much and doesn’t translate. But it’s allowed to be a little more specific than fully general. If you can use a few bits of cheating to nudge Green to speed up zer search a lot, that’s what we’re looking for.
B: Why did humanity end up ever discovering J-trees… If J-trees aren’t used in any non-computer human thingies, then humanity discovered J-trees due to working with computers specifically.
R: Yes.
B: So humans were working with computers, and because J-trees are natural abstractions about computers, humans discovered J-trees.
R: Right.
B: And I can tell Green to do the same thing. I can tell zer to work with computers.
R: Yes, exactly! What if you’re allowed to cheat more?
B: I could tell zer to try to do something that would be much easier using hash tables. Like, I don’t know, applying an expensive function to a large list of items with a lot of duplicates, where you want to check if you’ve already computed the function on a previous duplicate of an item.
R: Great. So, that’s clearly pretty cheaty. You’re transferring a lot of bits of information about what the algorithm is. On the other extreme, you could tell Green something very general, like “do hard things”. In between, you could say something like: “program a computer so that it’s very useful for some substantial enterprise, such as a large business or a frontier scientific investigation”. This doesn’t give many bits about hash tables specifically, but it presents a task which, if pursued successfully, would likely as a byproduct produce, in the pursuer, understanding of hash tables. There’s a spectrum of how specifically your advice is tailored to the specific task——how much information-theoretic entanglement they have.
The first few bits are in some sense the most important, like how the first 10x speedup saves 90% of the time you could ever save. But they have to be the right sort of bits, the sort of bit that constrains the space by a bit’s worth but also leaves the space “full bodied”, algorithmically general. When you slave all your conceptual explorations to “does this, on a relatively short time scale, in a relatively credit-assignable way, improve my predictions about the RAM?”, you’re constraining the space by a lot of bits, for sure, and those bits have something to do with your goal; but you’re constraining the search in a way that makes the truer / more successful ideas much farther away in the search order. And a good way to keep Green’s search algorithmically general is to call on zer agency, by pointing zer at a task that calls on zer agency to perform successfully.
Grown in a lab
B: If we tell Green to “program a computer so that it’s very useful for some substantial enterprise”, there’s an edge case where that doesn’t result in zer understanding hash tables. (Besides it just happening to be the case that hash tables weren’t necessary.) Suppose Green runs an automated search, like Solomonoff induction or gradient descent or something, and finds a program that visibly succeeds at some important task. Then ze has achieved the goal without necessarily understanding anything about hash tables, even if the found program does use hash tables internally.
R: Good point. If we’re imagining Green going off and understanding how to program computers, rather than just trying to achieve some specific goal, there’s some additional machinery, some additional taste that we’re assuming Green will apply to test whether zer concepts are satisfactory.
Morals of the story
B: How does this all transfer back to the top stack frame of trying to understand how minds work?
R: I’ll answer that in a little bit, but I want to first say some general morals I draw from this discussion. Though maybe the examples in the discussion should be left to stand on their own, for you to draw whatever morals seem salient to you.
The first moral that I’d draw is simple but crucial: If you’re trying to understand some phenomenon by interpreting some data, the kind of data you’re interpreting is key. It’s not enough for the data to be tightly related to the phenomenon——or to be downstream of the phenomenon, or enough to pin it down in the eyes of Solomonoff induction, or only predictable by understanding it. If you want to understand how a computer operating system works by interacting with one, it’s far far better to interact with the operating at or near the conceptual/structural regime at which the operating system is constituted.
What’s operating-system-y about an operating system is that it manages memory and caching, it manages CPU sharing between process, it manages access to hardware devices, and so on. If you can read and interact with the code that talks about those things, that’s much better than trying to understand operating systems by watching capacitors in RAM flickering, even if the sum of RAM+CPU+buses+storage gives you a reflection, an image, a projection of the operating system, which in some sense “doesn’t leave anything out”. What’s mind-ish about a human mind is reflected in neural firing and rewiring, in that a difference in mental state implies a difference in neurons. But if you want to come to understand minds, you should look at the operations of the mind in descriptive and manipulative terms that center around, and fan out from, the distinctions that the mind makes internally for its own benefit. In trying to interpret a mind, you’re trying to get the theory of the program.
In the koan, I disallowed you from interacting with the computer in the normal way. That’s an artificial limitation, and if you could interact with the computer by actually using it via keyboard and screen and high-level language, then of course you should. Likewise with minds.
The second moral I’d draw: Boundaries that seem to control access to a phenomenon are usually largely not really boundaries, because understanding is in large part logico-mathematical, and so transcends all boundaries. It may seem intuitively that the skull is a container for the mind, or that the neurons are a sort of container for more ephemeral mental elements. In other words, if I think about a stream of water, the water-thoughts have a core body, which is a representation of the path of the water that I’m imagining, and how the water sounds and moves and reflects light, and how it splashes against the rocks. The water-thoughts also have a skin or outer appearance, which is neurons and axons and electrons. The body is inside the skin; the structural core of the water-thoughts sits inside/atop/behind the visible/accessible presentation, the neurons. In this picture, to access the water-thoughts, you have to see or reach through the skin to get to the body. Your access to what’s going on inside an AI is only through the AI’s external behavior and through its parameters and activations; that’s the only way in.
But this is incorrect. You can see a calculator, and then go off and think about stuff without interacting with the calculator, and thereby, in the course of doing stuff and being required to learn addition, come to understand as if by magic much of the structure of what happens in the calculator when it does addition. Just knowing “this thing is an alien computer” is enough to tell you what sort investigation to do, if you want to bring into your understanding much of the structure inside the computer——namely, make a computer that’s very useful to you. Like scientists who find they speak the same language in much detail because they’ve been investigating the same domain, even though they’ve so far had sparse or no causal communication. A boundary is not a boundary.
Third moral: The criterion of understanding is general and open-ended. As you pointed out, if you just find, e.g. by graddescending differentiable circuits, a program that predicts the next RAM state from the current RAM state, you might still not really understand anything and still not know what a hash table is. But in what sense don’t you understand it, if you can predict it well? Here are other things that understanding involves:
Rebuilding the thing from scratch (e.g. the hash table, say without access to the RAM-sequence data).
Manipulating the ongoing operation of the thing, e.g. causing a bunch of hash collisions on purpose, or redirecting an agent’s behavior wieldily.
Applying the ideas in another context, e.g. inventing a datastructure that’s good for some other purpose that’s loosely inspired by your understanding of hash tables, or becoming good at recognizing when there’s a hash table running (even if it’s different from the original one, different enough that your trained model doesn’t predict it well).
Talking to another mind about the thing, e.g. quickly imparting to them the ability to predict the phenomenon by saying a few paragraphs.
Thinking along with the thing, as in gemini modeling.
Having opinions about the thing; having the thing as part of the world that your values can say “good” or “bad” about.
Generally, integrating the structure of the thing into your mind; making its structure available to relate to other mental elements when suitable; making your thoughts about the thing be useful, play some role in your mind.
Self-given
B: I’m still wondering how “go off and program a computer to be useful” transfers to the case of understanding minds.
R: Go off and think well——morally, effectively, funly, cooperatively, creatively, agentically, truth-trackingly, understandingly——and observe this thinking——and investigate/modify/design this thinking——and derive principles of mind that explain the core workhorses of the impressive things we do including self-reprogramming, and that explain what determines our values and how we continue caring across ontology shifts, and that continue to apply across mental change and across the human-AGI gap; where those principles of mind are made of ideas that are revealed by the counterfactual structure of possible ways of thinking revealed by our interventions on our thinking, like how car parts make more sense after you take them out and replace them with other analogous parts.
B: It sounds nice, but it kind of just sounds like you’re recommending mindfulness or something.
R: I’m recommending an investigation, which involves a lot of intervention. There’s a lot that’s fixed about human minds, by default. It’s much harder to see things that are fixed, because you don’t ever see them vary, so you don’t know to assign them credit or blame. But it seems to happen to be the case that human minds have a lot of key things mostly fixed. So you have to work hard to see them.
B: So are you recommending introspection, then?
R: Yeah, sort of. With the goal of using ourselves as a model organism. Phenomenology didn’t seem to meet with all that much success in the past century, and didn’t seem to employ a scientific attitude. Buddhism is an engineering project that dismantles the motivation to do the sort of investigation I propose. Psychologists chickened out because introspection——taking one’s own thoughts as data——”isn’t objective”. Lakoff, Hofstadter, and Yudkowsky are examples. Who else?
I think philosophy in general——or rather, metaphysics——can be read as this sort of investigation, but usually not reflectively aware. For example:
Space is necessarily infinite. What it means to be spatial is to have a field of local relatedness that extends homogeneously in every direction.
From some modern perspective, this is obviously useless nonsense. We know about 3-manifolds other than , our physical theories don’t require that the universe is topologically like , and sitting in your armchair saying things that sound “necessary” or whatever can’t tell you whether or not we live in a 3-sphere or a 3-torus or what. However, we can put the metaphysicist’s ramblings in special quotes:
«Space is necessarily infinite. What it means to be spatial is to have a field of local relatedness that extends homogeneously in every direction.»
If we put these quotes around a proposition P, we get the statement «P», which means something like “When I engage in thinking about the things that P talks about, and I allow my thoughts to form their local context without being moderated by or translated into my broader, more equilibrated, more coherent mental context, then this local thought-structure believes P——with the terms in P interpreted as this local context interprets them.”. So although space, the thing our bodies move around in, is obviously not “necessarily infinite” if that means anything, it’s also the case that if you think of space in a certain natural way it feels like there’s something that has to be infinite. Even if you imagine that the universe is really a 3-sphere (so that, e.g., if you shot off in a rocket straight in one direction you’d eventually go around and hit the Earth antipodally to your launchpad), it’s intuitive to think of the 3-sphere hanging in space——hanging in some surrounding infinite Euclidean 3-manifold. Or, living in a 3-torus, I look out from the balcony in the morning and wave to myself off in the distance, though he’s always busy waving to the next guy, and so on, stretching out to infinity.
We could theorize: Our intuitive sense of space comes from extrapolating/generalizing out from our understanding of local space around our bodies, plus experience with moving from one place to another. Extrapolating this way is like extrapolating to “the totality of natural numbers” from the process of counting. Encapsulating counting into is like encapsulating personal space and motion into . We can then abstract this pattern of abstraction, and ask about this pattern of abstraction “what’s it made of?” and “how did it get here?” and “how does it interface with metavalues?” and so on. Whether or not this analysis of space is correct, I hope it gestures at the general idea of philosophy as computation traces of our thinking, which can be taken as data and investigated.
Eventually we’re trying to understand the load-bearing parts of our minds——studying self-reprogramming from the inside.
- Dec 29, 2024, 12:35 PM; 16 points) 's comment on Alexander Gietelink Oldenziel’s Shortform by (
- Oct 7, 2024, 9:25 AM; 10 points) 's comment on DanielFilan’s Shortform Feed by (
- May 25, 2024, 9:18 PM; 4 points) 's comment on Talent Needs of Technical AI Safety Teams by (
Thinking about it more, I want to poke at the foundations of the koan. Why are we so sure that this is a computer at all? What permits us this certainty, that this is a computer, and that it is also running actual computation rather than glitching out?
From a different and more conceit-cooperative angle: it’s not just that this is a really hard science problem, it might be a maximally hard science problem. Maybe too hard for existing science to science at! After all, hash functions are meant to be maximally difficult, computationally speaking, to invert (and in fact impossible in the general case but merely very hard to generate hash collisions).
That Green has figured out how to probe the RAM properly, and how to assign meaning to the computations, and that zer Alien Computer is doing the same-ish thing that mine is?
It would follow, to me, that I should be looking for treelike patterns of activation, and in particular that maybe this is some application of the principles inherent to hash sort or radix sort to binary self-balancing trees, likely in memory address assignment, as might be necessary/worthwhile in a computer of a colossal scale such as we won’t even get until Y3k?
I’d disagree with Blue here! To clean and oil a machine and then run a quick test of function than setting it running to carefully watch it do its thing!
Doing so still never gets you to the idea of a homology sphere, and it isn’t enough to point towards the mathematically precise definition of an infinite 3-manifold without boundary.
Why do you need to be certain? Say there’s a screen showing a nice “high-level” interface that provides substantial functionality (without directly revealing the inner workings, e.g. there’s no shell). Something like that should be practically convincing.
I think the overall pattern of RAM activations should still tip you off, if you know what you’re looking for. E.g. you can see the pattern of collisions, and see the pattern of when the table gets resized. Not sure the point is that relevant though, we could also talk about an algorithm that doesn’t use especially-obscured components.
I’m unsure about that, but the more pertinent questions are along the lines of “is doing so the first (in understanding-time) available, or fastest, way to make the first few steps along the way that leads to these mathematically precise definitions? The conjecture here is “yes”.
Then whatever that’s doing is a constraint in itself, and I can start off by going looking for patterns of activation that correspond to e.g. simple-but-specific mathematical operations that I can actuate in the computer.
Maybe? But I’m definitely not convinced. Maybe for idealized humanesque minds, yes, but for actual humans, if your hypothesis were correct, Euler would not have had to invent topology in the 1700s, for instance.
It’s an interesting different strategy, but I think it’s a bad strategy. I think in the analogy this corresponds to doing something like psychophysics, or studying the algorithms involved in grammatically parsing a sentence; which is useful and interesting in a general sense, but isn’t a good way to get at the core of how minds work.
(I don’t understand the basic logic here—probably easier to chat about it later, if it’s a live question later.)
Can you elaborate on why you think “studying the algorithms involved in grammatically parsing a sentence” is not “a good way to get at the core of how minds work”?
For my part, I’ve read a decent amount of pure linguistics (in addition to neuro-linguistics) over the past few years, and find it to be a fruitful source of intuitions and hypotheses that generalize way beyond language. (But I’m probably asking different questions than you.)
I wonder if you’re thinking of, like, the nuts-and-bolts of syntax of specific languages, whereas I’m thinking of broader / deeper theorizing (random example), maybe?
Hm. I think my statement does firmly include the linked paper (at least the first half of it, insofar as I skimmed it).
It’s becoming clear that a lot of my statements have background mindsets that would take more substantial focused work to exposit. I’ll make some gestural comments.
When I say “not a good way...” I mean something like “is not among the top X elements of a portfolio aimed at solving this in 30 years (but may very well be among the top X elements of a portfolio aimed at solving this in 300 years)”.
Streetlighting, in a very broad sense that encompasses most or maybe all of foregoing science, is a very good strategy for making scientific progress—maybe the only strategy known to work. But it seems to be too slow. So I’m not assuming that “good” is about comparisons between different streetlights; if I were, then I’d consider lots of linguistic investigations to be “good”.
In fairly wide generality, I’m suspicious of legible phenomena.
(This may sound like an extreme statement; yes, I’m making a pretty extreme version of the statement.)
The reason is like this: “legible” means something like “readily relates to many things, and to standard/common things”. If there’s a core thing which is alien to your understanding, the legible emanations from that core are almost necessarily somewhat remote from the core. The emanations can be on a path from here to there, but they also contain a lot of irrelevant stuff, and can maybe in principle be circumvented (by doing math-like reasoning), so to speak.
So looking at the bytecode of a compiled python program does give you some access to the concepts involved in the python program itself, but those concepts are refracted through the compiler, so what you’re seeing in the bytecode has a lot of structure that’s interesting and useful and relevant to thinking about programs more generally, but is not really specifically relevant to the concepts involved in this specific python program.
Concretely in the case of linguistics, there’s an upstream core which is something like “internal automatic conceptual engineering to serve life tasks and play tasks”.
((This pointer is not supposed to, by itself, distinguish the referent from other things that sound like they fit the pointer taken as a description; e.g., fine, you can squint and reasonably say that some computer RL thing is doing “internal automatic...” but I claim the human thing is different and more powerful, and I’m just trying to point at that as distinct from speech.))
That upstream core has emanations / compilations / manifestations in speech, writing, internal monologue. The emanations have lots of structure. Some of that structure is actually relevant to the core. A lot of that structure is not very relevant, but is instead mostly about the collision of the core dynamics with other constraints.
Phonotactics is interesting, but even though it can be applied to describe how morphemes interact in the arena of speech, I don’t think we should expect it to tell us much about morphemes; the additional complexity is about sounds and ears and mouths, and not about morphemes.
A general theory about how the cognitive representations of “assassin” and “assassinate” overlap and disoverlap is interesting, but even though it can be applied to describe how ideas interact in the arena of word-production, I don’t think we should expect it tell us much about ideas; the additional complexity is about fast parallel datastructures, and not about ideas.
In other words, all the “core of how minds work” is hidden somewhere deep inside whatever [CAT] refers to.
Thanks!
One thing I would say is: if you have a (correct) theoretical framework, it should straightforwardly illuminate tons of diverse phenomena, but it’s very much harder to go backwards from the “tons of diverse phenomena” to the theoretical framework. E.g. any competent scientist who understands Evolution can apply it to explain patterns in finch beaks, but it took Charles Darwin to look at patterns in finch beaks and come up with the idea of Evolution.
Or in my own case, for example, I spent a day in 2021 looking into schizophrenia, but I didn’t know what to make of it, so I gave up. Then I tried again for a day in 2022, with a better theoretical framework under my belt, and this time I found that it slotted right into my then-current theoretical framework. And at the end of that day, I not only felt like I understood schizophrenia much better, but also my theoretical framework itself came out more enriched and detailed. And I iterated again in 2023, again simultaneously improving my understanding of schizophrenia and enriching my theoretical framework.
Anyway, if the “tons of diverse phenomena” are datapoints, and we’re in the middle of trying to come up with a theoretical framework that can hopefully illuminate all those datapoints, then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework), at any particular point in this process. The “schizophrenia” datapoint was totally unhelpful to me in 2021, but helpful to me in 2022. The “precession of Mercury” datapoint would not have helped Einstein when he was first brainstorming general relativity in 1907, but was presumably moderately helpful when he was thinking through the consequences of his prototype theory a few years later.
The particular phenomena / datapoints that are most useful for brainstorming the underlying theory (privileging the hypothesis), at any given point in the process, need not be the most famous and well-studied phenomena / datapoints. Einstein wrung much more insight out of the random-seeming datapoint “a uniform gravity field seems an awful lot like uniform acceleration” than out of any of the datapoints that would have been salient to a lesser gravity physicist, e.g. Newton’s laws or the shape of the galaxy or the Mercury precession. In my own case, there are random experimental neuroscience results (or everyday observations) that I see as profoundly revealing of deep truths, but which would not be particularly central or important from the perspective of other theoretical neuroscientists.
But, I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints. (Unless of course you’re also reading and internalizing crappy literature theorizing about those phenomena, and it’s filling your mind with garbage ideas that get in the way of constructing a better theory.) For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
Thanks, this is helpful to me.
An example of something: do LLMs have real understanding, in the way humans do? There’s a bunch of legible stuff that people would naturally pay attention to as datapoints associated with whatever humans do that’s called “real understanding”. E.g. being able to produce grammatical sentences, being able to answer a wide range of related questions correctly, writing a poem with s-initial words, etc. People might have even considered those datapoints dispositive for real understanding. And now LLMs can do those. … Now, according to me LLMs don’t have much real understanding, in the relevant sense or in the sense humans do. But it’s much harder to point at clear, legible benchmarks that show that LLMs don’t really understand much, compared to previous ML systems.
The “as brainstorming aids for developing the underlying theoretical framework” is doing a lot of work there. I’m noticing here that when someone says “we can try to understand XYZ by looking at legible thing ABC”, I often jump to conclusions (usually correctly actually) about the extent to which they are or aren’t trying to push past ABC to get to XYZ with their thinking. A key point of the OP is that some datapoints may be helpful, but they aren’t the main thing determining whether you get to [the understanding you want] quickly or slowly. The main thing is, vaguely, how you’re doing the brainstorming for developing the underlying theoretical framework.
I’m not saying all legible data is bad or irrelevant. I like thinking about human behavior, about evolution, about animal behavior; and my own thoughts are my primary data, which isn’t like maximally illegible or something. I’m just saying I’m suspicious of all legible data. Why?
Because there’s more coreward data available. That’s the argument of the OP: you actually do know how to relevantly theorize (e.g., go off and build a computer—which in the background involves theorizing about datastructures).
Because people streetlight, so they’re selecting points for being legible, which cuts against being close to the core of the thing you want to understand.
Because theorizing isn’t only, or even always mainly, about data. It’s also about constructing new ideas. That’s a distinct task; data can be helpful, but there’s no guarantee that reading the book of nature will lead you along such that in the background you construct the ideas you needed.
It’s legible, yeah. They should have it in mind, yeah. But after they’ve thought about it for a while they should notice that the real movers and shakers of the world are weird illegible things like religious belief, governments, progressivism, curiosity, invention, companies, child-rearing, math, resentment, …, which aren’t very relevantly described by the sort of theories people usually come up with when just staring at stuff like cold->sweater, AFAIK.
I don’t think we disagree much if at all.
I think constructing a good theoretical framework is very hard, so people often do other things instead, and I think you’re using the word “legible” to point to some of those other things.
I’m emphasizing that those other things are less than completely useless as semi-processed ingredients that can go into the activity of “constructing a good theoretical framework”
You’re emphasizing that those other things are not themselves the activity of “constructing a good theoretical framework”, and thus can distract from that activity, or give people a false sense of how much progress they’re making.
I think those are both true.
The pre-Darwin ecologists were not constructing a good theoretical framework. But they still made Darwin’s job easier, by extracting slightly-deeper patterns for him to explain with his much-deeper theory—concepts like “species” and “tree of life” and “life cycles” and “reproduction” etc. Those concepts were generally described by the wrong underlying gears before Darwin, but they were still contributions, in the sense that they compressed a lot of surface-level observations (Bird A is mating with Bird B, and then Bird B lays eggs, etc.) into a smaller number of things-to-be-explained. I think Darwin would have had a much tougher time if he was starting without the concepts of “finch”, “species”, “parents”, and so on.
By the same token, if we’re gonna use language as a datapoint for building a good underlying theoretical framework for the deep structure of knowledge and ideas, it’s hard to do that if we start from slightly-deep linguistic patterns (e.g. “morphosyntax”, “sister schemas”)… But it’s very much harder still to do that if we start with a mass of unstructured surface-level observations, like particular utterances.
I guess your perspective (based on here) is that, for the kinds of things you’re thinking about, people have not been successful even at the easy task of compressing a lot of surface-level observations into a smaller number of slightly-deeper patterns, let alone successful at the the much harder task of coming up with a theoretical framework that can deeply explain those slightly-deeper patterns? And thus you want to wholesale jettison all the previous theorizing? On priors, I think that would be kinda odd. But maybe I’m overstating your radicalism. :)
I mean the main thing I’d say here is that we just are going way too slowly / are not close enough. I’m not sure what counts as “jettisoning”; no reason to totally ignore anything, but in terms of reallocating effort, I guess what I advocate for looks like jettisoning everything. If you go from 0% or 2% of your efforts put toward questioning basic assumptions and theorizing based on introspective inspection and manipulation of thinking, to 50% or 80%, then in some sense you’ve jettisoned everything? Or half-jettisoned it?