If I was going to steelman Mr Tailcalled, I’d imagine that he was trying to “point at the reason” that transfer learning is far and away the exception.
Mostly learning (whether in humans, beasts, or software) happens relative to a highly specific domain of focus and getting 99.8% accuracy in the domain, and making a profit therein… doesn’t really generalize. I can’t run a hedge fund after mastering the hoola hoop, and I can’t win a boxing match from learning to recognize real and forged paintings. NONE of these skills would be much help in climbing a 200 foot tall redwood tree with my bare hands and bare feet… and mastering the Navajo language is yet again “mostly unrelated” to any of them. The challenges we agents seem to face in the world are “one damn thing after another”.
(Arguing against this steelman, the exception here might be “next token prediction”. Mastering next token prediction seems to grant the power to play Minecraft through APIs, win art contests, prove math theorems, and drive theologically confused people into psychosis. However, consistent with the steelman, next token prediction hasn’t seemed to offer any help at fabbing smaller and faster and more efficient computer chips. If next token prediction somehow starts to make chip fabbing go much faster, then hold onto your butts.)
This is exactly the reason why I asked the initial question. There is a reading of tailcalled’s statement which makes it correct and there is a reading which makes it wrong. I was curious which meaning is implied, and whether the difference between the two even understood.
When talking about top performance in highly specific domains, one should indeed use lots of domain specific tricks. But in a grand scheme of things the rule of “coherence + contact with the world” is extremely helpful, among other things it allows to derive all the specific tricks for all the different domains.
Likewise, there is a sense in which rationalist-empiricists project didn’t deliver to the fullest of our expectations when solving multiple specific technical problems. On the other hand, it definetely has succeed in a sense that philosophies based on this approach were so triumphant and delivered so many fruits, that we put them in a league of their own called “science”.
When talking about top performance in highly specific domains, one should indeed use lots of domain specific tricks. But in a grand scheme of things the rule of “coherence + contact with the world” is extremely helpful, among other things it allows to derive all the specific tricks for all the different domains.
This assumes you have contact with all the different domains, which you don’t, rather than just some of them.
Can you give an example of a domain which I have no contact with, so the coherence + contact with the world methodology won’t help me to figure out the corresponding domain specific tricks for succeeding in it, yet such tricks exist in principle?
But I don’t think you are doing space colonization. I’d guess you are doing reading/writing on social media, programming, grocery shopping, cooking, … . And I think recursive self-improvement is supposed to work with no experience in space colonization.
Meaning of my comment was “your examples are very weak in proving absense of cross-domain generalization”.
I can buy that there’s a sort of “trajectory of history” that makes use of all domains at once, I just think this is the opposite of what rationalist-empiricists are likely to focus on.
And if we are talking about me, right now I’m doing statistics, physics and signal processing, which seems to be awfully generalizable.
This is precisely the position that I am referring to when I say “the assumption was that the world is mostly homogeneous”. Like physics is generalizable if you think the nature of the world is matter. And you can use energy from the sun to decompose anything into matter, allowing you to command universal assent that everything is matter. But does that mean matter is everything? Does your physics knowledge tell you how to run a company? If not, why say it is “awfully generalizable”?
I don’t see how it makes sense in the context we are talking about.
Let’s take farming. Clearly, it’s not some separate magisteria which I do not have any connection to. Farming is happening in the same reality. I can see how people farm things, do it myself, learn about different methods, do experiments myself and so on. The “coherence + contact with the world” seems to be very helpful here.
I think of “the rationalist project” as “having succeeded” in a very limited and relative sense that is still quite valuable.
For example, back when the US and Chinese governments managed to accidentally make a half-cocked bioweapon and let it escape from a lab and then not do any adequate public health at all, or hold the people who caused the megadeath event even slightly accountable, and all of the institutions of basically every civilization on Earth failed to do their fucking jobs, the “rationalists” (ie the people on LW and so on) were neck and neck with anonymous anime catgirls on twitter (who overlap a lot with rationalists in practice) in terms of being actually sane and reasonable voices in the chaos… and it turns out that having some sane and reasonable voices is useful!
Eliezer says “Rationalists should win” but Yvain said “its really not that great” and Yvain got more upvotes (90 vs 247 currently) so Yvain is prolly right, right? But either way it means rationality is probably at least a little bit great <3
Rank the tasks by size as measured by e.g. energy content. Playing Minecraft, proving math theorems, and driving a theologically confused person to psychosis are all small in terms of energy, especially when considering that the models are not consistently driving anyone to psychosis and thus the theologically confused person who was driven to psychosis was probably highly predisposed to it.
Art competitions are more significant, but AFAIK the times when it won art competitions, it relied on human guidance. I tried to ask ChatGPT to make a picture that could win an art competition without giving it any more guidance, and it made this, which yes is extremely beautiful, but also seems deeply Gnostic and so probably unlikely to win great art competitions. AI art thus seems more suited for Gnostic decoration than for greatness. (Maybe healthy people will eventually develop an aversion to it? Already seems on the way; e.g. art competitions tend to forbid AI art.)
So, next token prediction can succeed in a lot of pathetic tasks. It has also gotten a lot of data with examples of completions of pathetic tasks. Thus the success doesn’t rely on homogeneity (extrapolation), it relies on heterogeneity of data (interpolation).
It’s not an accident that it has data on weak tasks. There are more instances of small forms than large forms, so there is more data available on the smaller forms. In order to get the data on the larger forms, it will take work to integrate it with the world, and let the data drill into the AI.
I read your gnostic/pagan stuff and chuckled over the “degeneracy [ranking where] Paganism < … < Gnosticism < Atheism < Buddhism”.
I think I’ll be better able to steelman you in the future and I’m sorry if I caused you to feel misrepresented with my previous attempt. I hadn’t realized that the vibe you’re trying to serve is so Nietzschean.
Just to clarify, when you say “pathetic” it is is not intended to evoke “pathos” and function as an even hypothetically possible compliment regarding a wise and pleasant deployment of feelings (even subtle feelings) in accord with reason, that could be unified and balanced to easily and pleasantly guide persons into actions in accord with The Good after thoughtful cultivation...
...but rather I suspect you intended it as a near semantic neighbor (but with opposite moral valence) of something like “precious” (as an insult (as it is in some idiolects)) in that both “precious and pathetic things” are similarly weak and small and in need of help.
Like the central thing you’re trying to communicate with the word “pathetic” (I think, but am not sure, and hence I’m seeking clarification) is to notice that entities labeled with that adjective could hypothetically be beloved and cared for… but you want to highlight how such things are also sort of worthy of contempt and might deserve abandonment.
We could argue: Such things are puny. They will not be good allies. They are not good role models. They won’t autonomously grow. They lack the power to even access whole regimes of coherently possible data gathering loops. They “will not win” and so, if you’re seeking “systematized winning”, such “pathetic” things are not where you should look. Is this something like what you’re trying to point to by invoking “patheticness” so centrally in a discussion of “solving philosophy formally”?
I must say, you did a very poor job at answering my question.
If I was going to steelman Mr Tailcalled, I’d imagine that he was trying to “point at the reason” that transfer learning is far and away the exception.
Mostly learning (whether in humans, beasts, or software) happens relative to a highly specific domain of focus and getting 99.8% accuracy in the domain, and making a profit therein… doesn’t really generalize. I can’t run a hedge fund after mastering the hoola hoop, and I can’t win a boxing match from learning to recognize real and forged paintings. NONE of these skills would be much help in climbing a 200 foot tall redwood tree with my bare hands and bare feet… and mastering the Navajo language is yet again “mostly unrelated” to any of them. The challenges we agents seem to face in the world are “one damn thing after another”.
(Arguing against this steelman, the exception here might be “next token prediction”. Mastering next token prediction seems to grant the power to play Minecraft through APIs, win art contests, prove math theorems, and drive theologically confused people into psychosis. However, consistent with the steelman, next token prediction hasn’t seemed to offer any help at fabbing smaller and faster and more efficient computer chips. If next token prediction somehow starts to make chip fabbing go much faster, then hold onto your butts.)
This is exactly the reason why I asked the initial question. There is a reading of tailcalled’s statement which makes it correct and there is a reading which makes it wrong. I was curious which meaning is implied, and whether the difference between the two even understood.
When talking about top performance in highly specific domains, one should indeed use lots of domain specific tricks. But in a grand scheme of things the rule of “coherence + contact with the world” is extremely helpful, among other things it allows to derive all the specific tricks for all the different domains.
Likewise, there is a sense in which rationalist-empiricists project didn’t deliver to the fullest of our expectations when solving multiple specific technical problems. On the other hand, it definetely has succeed in a sense that philosophies based on this approach were so triumphant and delivered so many fruits, that we put them in a league of their own called “science”.
This assumes you have contact with all the different domains, which you don’t, rather than just some of them.
Can you give an example of a domain which I have no contact with, so the coherence + contact with the world methodology won’t help me to figure out the corresponding domain specific tricks for succeeding in it, yet such tricks exist in principle?
Farming, law enforcement, war, legislation, chip fabbing, space colonization, cargo trucking, …
Space colonization obviously includes cargo trucking, farming, legislation, chip fabbing, law enforcement, and, for appreciators, war.
But I don’t think you are doing space colonization. I’d guess you are doing reading/writing on social media, programming, grocery shopping, cooking, … . And I think recursive self-improvement is supposed to work with no experience in space colonization.
Meaning of my comment was “your examples are very weak in proving absense of cross-domain generalization”.
And if we are talking about me, right now I’m doing statistics, physics and signal processing, which seems to be awfully generalizable.
I can buy that there’s a sort of “trajectory of history” that makes use of all domains at once, I just think this is the opposite of what rationalist-empiricists are likely to focus on.
This is precisely the position that I am referring to when I say “the assumption was that the world is mostly homogeneous”. Like physics is generalizable if you think the nature of the world is matter. And you can use energy from the sun to decompose anything into matter, allowing you to command universal assent that everything is matter. But does that mean matter is everything? Does your physics knowledge tell you how to run a company? If not, why say it is “awfully generalizable”?
I don’t see how it makes sense in the context we are talking about.
Let’s take farming. Clearly, it’s not some separate magisteria which I do not have any connection to. Farming is happening in the same reality. I can see how people farm things, do it myself, learn about different methods, do experiments myself and so on. The “coherence + contact with the world” seems to be very helpful here.
I think of “the rationalist project” as “having succeeded” in a very limited and relative sense that is still quite valuable.
For example, back when the US and Chinese governments managed to accidentally make a half-cocked bioweapon and let it escape from a lab and then not do any adequate public health at all, or hold the people who caused the megadeath event even slightly accountable, and all of the institutions of basically every civilization on Earth failed to do their fucking jobs, the “rationalists” (ie the people on LW and so on) were neck and neck with anonymous anime catgirls on twitter (who overlap a lot with rationalists in practice) in terms of being actually sane and reasonable voices in the chaos… and it turns out that having some sane and reasonable voices is useful!
Eliezer says “Rationalists should win” but Yvain said “its really not that great” and Yvain got more upvotes (90 vs 247 currently) so Yvain is prolly right, right? But either way it means rationality is probably at least a little bit great <3
Rank the tasks by size as measured by e.g. energy content. Playing Minecraft, proving math theorems, and driving a theologically confused person to psychosis are all small in terms of energy, especially when considering that the models are not consistently driving anyone to psychosis and thus the theologically confused person who was driven to psychosis was probably highly predisposed to it.
Art competitions are more significant, but AFAIK the times when it won art competitions, it relied on human guidance. I tried to ask ChatGPT to make a picture that could win an art competition without giving it any more guidance, and it made this, which yes is extremely beautiful, but also seems deeply Gnostic and so probably unlikely to win great art competitions. AI art thus seems more suited for Gnostic decoration than for greatness. (Maybe healthy people will eventually develop an aversion to it? Already seems on the way; e.g. art competitions tend to forbid AI art.)
So, next token prediction can succeed in a lot of pathetic tasks. It has also gotten a lot of data with examples of completions of pathetic tasks. Thus the success doesn’t rely on homogeneity (extrapolation), it relies on heterogeneity of data (interpolation).
It’s not an accident that it has data on weak tasks. There are more instances of small forms than large forms, so there is more data available on the smaller forms. In order to get the data on the larger forms, it will take work to integrate it with the world, and let the data drill into the AI.
I read your gnostic/pagan stuff and chuckled over the “degeneracy [ranking where] Paganism < … < Gnosticism < Atheism < Buddhism”.
I think I’ll be better able to steelman you in the future and I’m sorry if I caused you to feel misrepresented with my previous attempt. I hadn’t realized that the vibe you’re trying to serve is so Nietzschean.
Just to clarify, when you say “pathetic” it is is not intended to evoke “pathos” and function as an even hypothetically possible compliment regarding a wise and pleasant deployment of feelings (even subtle feelings) in accord with reason, that could be unified and balanced to easily and pleasantly guide persons into actions in accord with The Good after thoughtful cultivation...
...but rather I suspect you intended it as a near semantic neighbor (but with opposite moral valence) of something like “precious” (as an insult (as it is in some idiolects)) in that both “precious and pathetic things” are similarly weak and small and in need of help.
Like the central thing you’re trying to communicate with the word “pathetic” (I think, but am not sure, and hence I’m seeking clarification) is to notice that entities labeled with that adjective could hypothetically be beloved and cared for… but you want to highlight how such things are also sort of worthy of contempt and might deserve abandonment.
We could argue: Such things are puny. They will not be good allies. They are not good role models. They won’t autonomously grow. They lack the power to even access whole regimes of coherently possible data gathering loops. They “will not win” and so, if you’re seeking “systematized winning”, such “pathetic” things are not where you should look. Is this something like what you’re trying to point to by invoking “patheticness” so centrally in a discussion of “solving philosophy formally”?