One thing I would say is: if you have a (correct) theoretical framework, it should straightforwardly illuminate tons of diverse phenomena, but it’s very much harder to go backwards from the “tons of diverse phenomena” to the theoretical framework. E.g. any competent scientist who understands Evolution can apply it to explain patterns in finch beaks, but it took Charles Darwin to look at patterns in finch beaks and come up with the idea of Evolution.
Or in my own case, for example, I spent a day in 2021 looking into schizophrenia, but I didn’t know what to make of it, so I gave up. Then I tried again for a day in 2022, with a better theoretical framework under my belt, and this time I found that it slotted right into my then-current theoretical framework. And at the end of that day, I not only felt like I understood schizophrenia much better, but also my theoretical framework itself came out more enriched and detailed. And I iterated again in 2023, again simultaneously improving my understanding of schizophrenia and enriching my theoretical framework.
Anyway, if the “tons of diverse phenomena” are datapoints, and we’re in the middle of trying to come up with a theoretical framework that can hopefully illuminate all those datapoints, then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework), at any particular point in this process. The “schizophrenia” datapoint was totally unhelpful to me in 2021, but helpful to me in 2022. The “precession of Mercury” datapoint would not have helped Einstein when he was first brainstorming general relativity in 1907, but was presumably moderately helpful when he was thinking through the consequences of his prototype theory a few years later.
The particular phenomena / datapoints that are most useful for brainstorming the underlying theory (privileging the hypothesis), at any given point in the process, need not be the most famous and well-studied phenomena / datapoints. Einstein wrung much more insight out of the random-seeming datapoint “a uniform gravity field seems an awful lot like uniform acceleration” than out of any of the datapoints that would have been salient to a lesser gravity physicist, e.g. Newton’s laws or the shape of the galaxy or the Mercury precession. In my own case, there are random experimental neuroscience results (or everyday observations) that I see as profoundly revealing of deep truths, but which would not be particularly central or important from the perspective of other theoretical neuroscientists.
But, I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints. (Unless of course you’re also reading and internalizing crappy literature theorizing about those phenomena, and it’s filling your mind with garbage ideas that get in the way of constructing a better theory.) For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
An example of something: do LLMs have real understanding, in the way humans do? There’s a bunch of legible stuff that people would naturally pay attention to as datapoints associated with whatever humans do that’s called “real understanding”. E.g. being able to produce grammatical sentences, being able to answer a wide range of related questions correctly, writing a poem with s-initial words, etc. People might have even considered those datapoints dispositive for real understanding. And now LLMs can do those. … Now, according to me LLMs don’t have much real understanding, in the relevant sense or in the sense humans do. But it’s much harder to point at clear, legible benchmarks that show that LLMs don’t really understand much, compared to previous ML systems.
then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework),
The “as brainstorming aids for developing the underlying theoretical framework” is doing a lot of work there. I’m noticing here that when someone says “we can try to understand XYZ by looking at legible thing ABC”, I often jump to conclusions (usually correctly actually) about the extent to which they are or aren’t trying to push past ABC to get to XYZ with their thinking. A key point of the OP is that some datapoints may be helpful, but they aren’t the main thing determining whether you get to [the understanding you want] quickly or slowly. The main thing is, vaguely, how you’re doing the brainstorming for developing the underlying theoretical framework.
I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints.
I’m not saying all legible data is bad or irrelevant. I like thinking about human behavior, about evolution, about animal behavior; and my own thoughts are my primary data, which isn’t like maximally illegible or something. I’m just saying I’m suspicious of all legible data. Why?
Because there’s more coreward data available. That’s the argument of the OP: you actually do know how to relevantly theorize (e.g., go off and build a computer—which in the background involves theorizing about datastructures).
Because people streetlight, so they’re selecting points for being legible, which cuts against being close to the core of the thing you want to understand.
Because theorizing isn’t only, or even always mainly, about data. It’s also about constructing new ideas. That’s a distinct task; data can be helpful, but there’s no guarantee that reading the book of nature will lead you along such that in the background you construct the ideas you needed.
For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
It’s legible, yeah. They should have it in mind, yeah. But after they’ve thought about it for a while they should notice that the real movers and shakers of the world are weird illegible things like religious belief, governments, progressivism, curiosity, invention, companies, child-rearing, math, resentment, …, which aren’t very relevantly described by the sort of theories people usually come up with when just staring at stuff like cold->sweater, AFAIK.
I think constructing a good theoretical framework is very hard, so people often do other things instead, and I think you’re using the word “legible” to point to some of those other things.
I’m emphasizing that those other things are less than completely useless as semi-processed ingredients that can go into the activity of “constructing a good theoretical framework”
You’re emphasizing that those other things are not themselves the activity of “constructing a good theoretical framework”, and thus can distract from that activity, or give people a false sense of how much progress they’re making.
I think those are both true.
The pre-Darwin ecologists were not constructing a good theoretical framework. But they still made Darwin’s job easier, by extracting slightly-deeper patterns for him to explain with his much-deeper theory—concepts like “species” and “tree of life” and “life cycles” and “reproduction” etc. Those concepts were generally described by the wrong underlying gears before Darwin, but they were still contributions, in the sense that they compressed a lot of surface-level observations (Bird A is mating with Bird B, and then Bird B lays eggs, etc.) into a smaller number of things-to-be-explained. I think Darwin would have had a much tougher time if he was starting without the concepts of “finch”, “species”, “parents”, and so on.
By the same token, if we’re gonna use language as a datapoint for building a good underlying theoretical framework for the deep structure of knowledge and ideas, it’s hard to do that if we start from slightly-deep linguistic patterns (e.g. “morphosyntax”, “sister schemas”)… But it’s very much harder still to do that if we start with a mass of unstructured surface-level observations, like particular utterances.
I guess your perspective (based on here) is that, for the kinds of things you’re thinking about, people have not been successful even at the easy task of compressing a lot of surface-level observations into a smaller number of slightly-deeper patterns, let alone successful at the the much harder task of coming up with a theoretical framework that can deeply explain those slightly-deeper patterns? And thus you want to wholesale jettison all the previous theorizing? On priors, I think that would be kinda odd. But maybe I’m overstating your radicalism. :)
I mean the main thing I’d say here is that we just are going way too slowly / are not close enough. I’m not sure what counts as “jettisoning”; no reason to totally ignore anything, but in terms of reallocating effort, I guess what I advocate for looks like jettisoning everything. If you go from 0% or 2% of your efforts put toward questioning basic assumptions and theorizing based on introspective inspection and manipulation of thinking, to 50% or 80%, then in some sense you’ve jettisoned everything? Or half-jettisoned it?
Thanks!
One thing I would say is: if you have a (correct) theoretical framework, it should straightforwardly illuminate tons of diverse phenomena, but it’s very much harder to go backwards from the “tons of diverse phenomena” to the theoretical framework. E.g. any competent scientist who understands Evolution can apply it to explain patterns in finch beaks, but it took Charles Darwin to look at patterns in finch beaks and come up with the idea of Evolution.
Or in my own case, for example, I spent a day in 2021 looking into schizophrenia, but I didn’t know what to make of it, so I gave up. Then I tried again for a day in 2022, with a better theoretical framework under my belt, and this time I found that it slotted right into my then-current theoretical framework. And at the end of that day, I not only felt like I understood schizophrenia much better, but also my theoretical framework itself came out more enriched and detailed. And I iterated again in 2023, again simultaneously improving my understanding of schizophrenia and enriching my theoretical framework.
Anyway, if the “tons of diverse phenomena” are datapoints, and we’re in the middle of trying to come up with a theoretical framework that can hopefully illuminate all those datapoints, then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework), at any particular point in this process. The “schizophrenia” datapoint was totally unhelpful to me in 2021, but helpful to me in 2022. The “precession of Mercury” datapoint would not have helped Einstein when he was first brainstorming general relativity in 1907, but was presumably moderately helpful when he was thinking through the consequences of his prototype theory a few years later.
The particular phenomena / datapoints that are most useful for brainstorming the underlying theory (privileging the hypothesis), at any given point in the process, need not be the most famous and well-studied phenomena / datapoints. Einstein wrung much more insight out of the random-seeming datapoint “a uniform gravity field seems an awful lot like uniform acceleration” than out of any of the datapoints that would have been salient to a lesser gravity physicist, e.g. Newton’s laws or the shape of the galaxy or the Mercury precession. In my own case, there are random experimental neuroscience results (or everyday observations) that I see as profoundly revealing of deep truths, but which would not be particularly central or important from the perspective of other theoretical neuroscientists.
But, I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints. (Unless of course you’re also reading and internalizing crappy literature theorizing about those phenomena, and it’s filling your mind with garbage ideas that get in the way of constructing a better theory.) For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn’t call that example “legible”?)
Thanks, this is helpful to me.
An example of something: do LLMs have real understanding, in the way humans do? There’s a bunch of legible stuff that people would naturally pay attention to as datapoints associated with whatever humans do that’s called “real understanding”. E.g. being able to produce grammatical sentences, being able to answer a wide range of related questions correctly, writing a poem with s-initial words, etc. People might have even considered those datapoints dispositive for real understanding. And now LLMs can do those. … Now, according to me LLMs don’t have much real understanding, in the relevant sense or in the sense humans do. But it’s much harder to point at clear, legible benchmarks that show that LLMs don’t really understand much, compared to previous ML systems.
The “as brainstorming aids for developing the underlying theoretical framework” is doing a lot of work there. I’m noticing here that when someone says “we can try to understand XYZ by looking at legible thing ABC”, I often jump to conclusions (usually correctly actually) about the extent to which they are or aren’t trying to push past ABC to get to XYZ with their thinking. A key point of the OP is that some datapoints may be helpful, but they aren’t the main thing determining whether you get to [the understanding you want] quickly or slowly. The main thing is, vaguely, how you’re doing the brainstorming for developing the underlying theoretical framework.
I’m not saying all legible data is bad or irrelevant. I like thinking about human behavior, about evolution, about animal behavior; and my own thoughts are my primary data, which isn’t like maximally illegible or something. I’m just saying I’m suspicious of all legible data. Why?
Because there’s more coreward data available. That’s the argument of the OP: you actually do know how to relevantly theorize (e.g., go off and build a computer—which in the background involves theorizing about datastructures).
Because people streetlight, so they’re selecting points for being legible, which cuts against being close to the core of the thing you want to understand.
Because theorizing isn’t only, or even always mainly, about data. It’s also about constructing new ideas. That’s a distinct task; data can be helpful, but there’s no guarantee that reading the book of nature will lead you along such that in the background you construct the ideas you needed.
It’s legible, yeah. They should have it in mind, yeah. But after they’ve thought about it for a while they should notice that the real movers and shakers of the world are weird illegible things like religious belief, governments, progressivism, curiosity, invention, companies, child-rearing, math, resentment, …, which aren’t very relevantly described by the sort of theories people usually come up with when just staring at stuff like cold->sweater, AFAIK.
I don’t think we disagree much if at all.
I think constructing a good theoretical framework is very hard, so people often do other things instead, and I think you’re using the word “legible” to point to some of those other things.
I’m emphasizing that those other things are less than completely useless as semi-processed ingredients that can go into the activity of “constructing a good theoretical framework”
You’re emphasizing that those other things are not themselves the activity of “constructing a good theoretical framework”, and thus can distract from that activity, or give people a false sense of how much progress they’re making.
I think those are both true.
The pre-Darwin ecologists were not constructing a good theoretical framework. But they still made Darwin’s job easier, by extracting slightly-deeper patterns for him to explain with his much-deeper theory—concepts like “species” and “tree of life” and “life cycles” and “reproduction” etc. Those concepts were generally described by the wrong underlying gears before Darwin, but they were still contributions, in the sense that they compressed a lot of surface-level observations (Bird A is mating with Bird B, and then Bird B lays eggs, etc.) into a smaller number of things-to-be-explained. I think Darwin would have had a much tougher time if he was starting without the concepts of “finch”, “species”, “parents”, and so on.
By the same token, if we’re gonna use language as a datapoint for building a good underlying theoretical framework for the deep structure of knowledge and ideas, it’s hard to do that if we start from slightly-deep linguistic patterns (e.g. “morphosyntax”, “sister schemas”)… But it’s very much harder still to do that if we start with a mass of unstructured surface-level observations, like particular utterances.
I guess your perspective (based on here) is that, for the kinds of things you’re thinking about, people have not been successful even at the easy task of compressing a lot of surface-level observations into a smaller number of slightly-deeper patterns, let alone successful at the the much harder task of coming up with a theoretical framework that can deeply explain those slightly-deeper patterns? And thus you want to wholesale jettison all the previous theorizing? On priors, I think that would be kinda odd. But maybe I’m overstating your radicalism. :)
I mean the main thing I’d say here is that we just are going way too slowly / are not close enough. I’m not sure what counts as “jettisoning”; no reason to totally ignore anything, but in terms of reallocating effort, I guess what I advocate for looks like jettisoning everything. If you go from 0% or 2% of your efforts put toward questioning basic assumptions and theorizing based on introspective inspection and manipulation of thinking, to 50% or 80%, then in some sense you’ve jettisoned everything? Or half-jettisoned it?