Mathematics students are often annoyed that they have to worry about “bizarre or unnatural” counterexamples when proving things. For instance, differentiable functions without continuous derivative are pretty weird. Particularly engineers tend to protest that these things will never occur in practice, because they don’t show up physically. But these adversarial examples show up constantly in the practice of mathematics—when I am trying to prove (or calculate) something difficult, I will try to cram the situation into a shape that fits one of the theorems in my toolbox, and if those tools don’t naturally apply I’ll construct all kinds of bizarre situations along the way while changing perspective. In other words, bizarre adversarial examples are common in intermediate calculations—that’s why you can’t just safely forget about them when proving theorems. Your logic has to be totally sound as a matter of abstraction or interface design—otherwise someone will misuse it.
While I think the reaction against pathological examples can definitely make sense, and in particular there is a bad habit of some people to overfocus on pathological examples, I do think mathematics is quite different from other fields in that you want to prove that a property holds for all objects with a certain property, or prove that there exists an object with a certain property, and in these cases you can’t ignore the pathological examples, because they can provide you with either solutions to your problem, or show why your approach can’t work.
This is why I didn’t exactly like Dalcy’s point 3 here:
There is also the reverse case, where it is often common practice in math or logic to ignore bizarre and unnatural counterexamples. For example, first-order Peano arithmetic is often identified with Peano arithmetic in general, even though the first order theory allows the existence of highly “unnatural” numbers which are certainly not natural numbers, which are the subject of Peano arithmetic.
Another example is the power set axiom in set theory. It is usually assumed to imply the existence of the power set of each infinite set. But the axiom only implies that the existence of such power sets is possible, i.e. that they can exist (in some models), not that they exist full stop. In general, non-categorical theories are often tacitly assumed to talk about some intuitive standard model, even though the axioms don’t specify it.
Perhaps LLM’s are starting to approach the intelligence of today’s average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)
This doesn’t seem to be reflected in the general opinion here, but it seems to me that LLM’s are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they’re heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account—however, this is outweighed by my personal experience with LLM’s. I just don’t find them useful for practically anything. I have a pretty consistently correct model of the problems they will be able to help me with and it’s not a lot—maybe a broad introduction to a library I’m not familiar with or detecting simple bugs. That model has worked for a year or two without expanding the set much. Also, I don’t see any applications to anything economically productive except for fluffy chatbot apps.
Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
Base model scale has only increased maybe 3-5x in the last 2 years, from 2e25 FLOPs (original GPT-4) up to maybe 1e26 FLOPs[1]. So I think to a significant extent the experiment of further scaling hasn’t been run, and the 100K H100s clusters that have just started training new models in the last few months promise another 3-5x increase in scale, to 2e26-6e26 FLOPs.
possibly have already plateaued a year or so ago
Right, the metrics don’t quite capture how smart a model is, and the models haven’t been getting much smarter for a while now. But it might be simply because they weren’t scaled much further (compared to original GPT-4) in all this time. We’ll see in the next few months as the labs deploy the models trained on 100K H100s (and whatever systems Google has).
This is 3 months on 30K H100s, $140 million at $2 per H100-hour, which is plausible, but not rumored about specific models. Llama-3-405B is 4e25 FLOPs, but not MoE. Could well be that 6e25 FLOPs is the most anyone trained for with models deployed so far.
I’ve noticed they perform much better on graduate-level ecology/evolution questions (in a qualitative sense—they provide answers that are more ‘full’ as well as technically accurate). I think translating that into a “usefulness” metric is always going to be difficult though.
The last few weeks I felt the opposite of this. I kind of go back and forth on thinking they are plateauing and then I get surprised with the new Sonnet version or o1-preview. I also experiment with my own prompting a lot.
I’ve been waiting to say this until OpenAI’s next larger model dropped, but this has now failed to happen for so long that it’s become it’s own update, and I’d like to state my prediction before it becomes obvious.
Sometimes I wonder if people who obsess over the “paradox of free will” are having some “universal human experience” that I am missing out on. It has never seemed intuitively paradoxical to me, and all of the arguments about it seem either obvious or totally alien. Learning more about agency has illuminated some of the structure of decision making for me, but hasn’t really effected this (apparently) fundamental inferential gap. Do some people really have this overwhelming gut feeling of free will that makes it repulsive to accept a lawful universe?
I used to, as a child. I did accept a lawful universe, but I thought my perception of free will was in tension with that, so that perception must be “an illusion”.
My mother kept trying to explain to me that there was no tension between these things, because it was correct that my mind made its own decisions rather than some outside force. I didn’t understand what she was saying though. I thought she was just redefining ‘free will’ from a claim that human brains effectively had a magical ability to spontaneously ignore the laws of physics to a boring tautological claim that human decisions are made by humans rather than something else.
I changed my mind on this as a teenager. I don’t quite remember how, it might have been the sequences or HPMOR again. I realised that my imagination had still been partially conceptualising the “laws of physics” as some sort of outside force, a set of strings pulling my atoms around, rather than as a predictive description of me and the universe. Saying “the laws of physics make my decisions, not me” made about as much sense as saying “my fingers didn’t move, my hand did.” That was what my mother had been trying to tell me.
I don’t think so as I had success explaining away the paradox with concept of “different levels of detail”—saying that free will is a very high-level concept and further observations reveal a lower-level view, calling upon analogy with algorithmic programming’s segment tree.
(Segment tree is a data structure that replaces an array, allowing to modify its values and compute a given function over all array elements efficiently. It is based on tree of nodes, each of those representing a certain subarray; each position is therefore handled by several—specifically, O(logn) nodes.)
This might be related to whether you see yourself as a part of the universe, or as an observer. If you are an observer, the objection is like “if I watch a movie, everything in the movie follows the script, but I am outside the movie, therefore outside the influence of the script”.
If you are religious, I guess your body is a part of the universe (obeys the laws of gravity etc.), but your soul is the impartial observer. Here the religion basically codifies the existing human intuitions.
It might also depend on how much you are aware of the effects of your environment on you. This is a learned skill; for example little kids do not realize that they are hungry… they just get kinda angry without knowing why. It requires some learning to realize “this feeling I have right now—it is hunger, and it will probably go away if I eat something”. And I guess the more knowledge of this kind you accumulate, the easier it is to see yourself as a part of the universe, rather than being outside of it and only moved by “inherently mysterious” forces.
Most ordinary people don’t know that no one understands how neural networks work (or even that modern “Generative A.I.” is based on neural networks). This might be an underrated message since the inferential distance here is surprisingly high.
It’s hard to explain the more sophisticated models that we often use to argue that human dis-empowerment is the default outcome but perhaps much better leveraged to explain these three points:
1) No one knows how A.I models / LLMs / neural nets work (with some explanation of how this is conceptually possible).
2) We don’t know how smart they will get how soon.
3) We can’t control what they’ll do once they’re smarter than us.
At least under my state of knowledge, this is also a particularly honest messaging strategy, because it emphasizes the fundamental ignorance of A.I. researchers.
“Optimization power” is not a scalar multiplying the “objective” vector. There are different types. It’s not enough to say that evolution has had longer to optimize things but humans are now “better” optimizers: Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other—they’re radically different sorts of things.
Evolution optimizes things in a massively parallel way, so that they’re robustly good at lots of different selectively relevant things at once, and has been doing this for a very long time so that inconceivably many tiny lessons are baked in a little bit. Humans work differently—we try to figure out what works for explainable, preferably provable reasons. We also blindly twiddle parameters a bit, but we can only keep so many parameters in mind at once and compare so many metrics—humanity has a larger working memory than individual humans, but the human innovation engine is still driven by linguistic theories, expressed in countable languages. There must be a thousand deep mathematical truths that evolution is already taking advantage of to optimize its DNA repair algorithms, or design wings to work very well under both ordinary and rare turbulent conditions, or minimize/maximize surface tensions of fluids, or invent really excellent neural circuits—without ever actually finding the elaborate proofs. Solving for exact closed form solutions is often incredibly hard, even when the problem can be well-specified, but natural selection doesn’t care. It will find what works locally, regardless of logical depth. It might take humans thousands of years to work some of these details out on paper. But once we’ve worked something out, we can deliberately scale it further and avoid local minima. This distinction in strategies of evolution v.s. humans rhymes with wisdom v.s. intelligence—though in this usage intelligence includes all the insight, except insofar as evolution located and acts through us. As a sidebar, I think some humans prefer an intuitive strategy that is more analogous to evolution’s in effect (but not implementation).
So what about when humans turn to building a mind? Perhaps a mind is by its nature something that needs to be robust, optimized in lots of little nearly inexplicable ways for arcane reasons to deal with edge cases. After all, isn’t a mind exactly that which provides an organism/robot/agent with the ability to adapt flexibly to new situations? A plane might be faster than a bird, throwing more power at the basic aerodynamics, but it is not as flexible—can we scale some basic principles to beat out brains with the raw force of massive energy expenditure? Or is intelligence inherently about flexibility, and impossible to brute force in that way? Certainly it’s not logically inconsistent to imagine that flexibility itself has a simple underlying rule—as a potential existence proof, the mechanics of evolutionary selection are at least superficially simple, though we can’t literally replicate it without a fast world-simulator, which would be rather complicated. And maybe evolution is not a flexible thing, but only a designer of flexible things. So neither conclusion seems like a clear winner a priori.
The empirical answers so far seem to complicate the story. Attempts to build a “glass box” intelligence out of pure math (logic or probability) have so far not succeeded, though they have provided useful tools and techniques (like statistics) that avoid the fallacies and biases of human minds. But we’ve built a simple outer loop optimization target called “next token prediction” and thrown raw compute at it, and managed to optimize black box “minds” in a new way (called gradient descent by backpropogation). Perhaps the process we’ve capture is a little more like evolution, designing lots of little tricks that work for inscrutable reasons. And perhaps it will work, woe unto us, who have understood almost nothing from it!
Gary Kasparov would beat me at chess in some way I can’t predict in advance. However, if the game starts with half his pieces removed from the board, I will beat him by playing very carefully. The first above-human level A.G.I. seems overwhelmingly likely to be down a lot of material—massively outnumbered, running on our infrastructure, starting with access to pretty crap/low bandwidth actuators in the physical world and no legal protections (yes, this actually matters when you’re not as smart as ALL of humanity—it’s a disadvantage relative to even the average human). If we exercise even a modicum of competence it will also be even tougher (e.g. an air gap, dedicated slightly weaker controllers, exposed thoughts at some granularity). If the chess metaphor holds we should expect the first such A.G.I. not to beat us—but it may well attempt to escape under many incentive structures. Does this mean we should expect to have many tries to solve alignment?
If you think not, it’s probably because of some dis-analogy with chess. For instance, the search space in the real world is much richer, and maybe there are always some “killer moves” available if you’re smart enough to see them e.g. invent nanotech. This seems to tie in with people’s intuitions about A) how fragile the world is and B) how g-loaded the game of life is. Personally I’m highly uncertain about both, but I suspect the answers are “somewhat.”
I would guess that A.G.I. that only wants to end the world might be able to pull it off with slightly superhuman intelligence, which is very scary to me. But I think it would actually be very hard to bootstrap all singularity level infrastructure from a post-apocalyptic wasteland, so perhaps this is actually not an convergent instrumental subgoal at this level of intelligence.
Is life actually much more g-loaded than chess? In terms of how far you can in principle multiply your material, unequivocally yes. However life is also more stochastic—I will never beat Gary Kasparov in a fair game, but if Jeff Bezos and I started over with ~0 dollars and no name recognition / average connections today, I think there’s a good >1% chance I’m richer in a year. It’s not immediately clear to me which view is more relevant here.
The primary optimization target for LLM companies/engineers seems to be making them seem smart to humans, particularly the nerds who seem prone to using them frequently. A lot of money and talent is being spent on this. It seems reasonable to expect that they are less smart than they seem to you, particularly if you are in the target category. This is a type of Goodharting.
In fact, I am beginning to suspect that they aren’t really good for anything except seeming smart, and most rationalists have totally fallen for it, for example Zvi insisting that anyone who is not using LLMs to multiply their productivity is not serious (this is a vibe not a direct quote but I think it’s a fair representation of his writing over the last year). If I had to guess, LLMs have 0.99x’ed my productivity by occasionally convincing me to try to use them which is not quite paid for by very rarely fixing a bug in my code. The number is close to 1x because I don’t use them much, not because they’re almost useful. Lots of other people seem to have much worse ratios because LLMs act as a superstimulus for them (not primarily a productivity tool).
Certainly this is an impressive technology, surprising for its time, and probably more generally intelligent than anything else we have built—not going to get into it here, but my model is that intelligence is not totally “atomic” but has various pieces, some of which are present and some missing in LLMs. But maybe the impressiveness is not a symptom of intelligence, but the intelligence a symptom of impressiveness—and if so, it’s fair to say that we have (to varying degrees) been tricked.
for example Zvi insisting that anyone who is not using LLMs to 10x their productivity is not serious … a vibe not a direct quote
I expect he’d disagree, for example I vaguely recall him mentioning that LLMs are not useful in a productivity-changing way for his own work. And 10x specifically seems clearly too high for most things even where LLMs are very useful, other bottlenecks will dominate before that happens.
10x was probably too strong but his posts are very clear he things it’s a large productivity multiplier. I’ll try to remember to link the next instance I see.
AI doesn’t accelerate my writing much, although it is often helpful in parsing papers and helping me think through things. But it’s a huge multiplier on my coding, like more than 10x.
I suspect that human minds are vast (more like little worlds of our own than clockwork baubles) and even a superintelligence would have trouble predicting our outputs accurately from (even quite) a few conversations (without direct microscopic access) as a matter of sample complexity.
There is a large body of non-AI literature that already addresses this, for example the research of Gerd Gigerenzer which shows that often heuristics and “fast and frugal” decision trees substantially outperform fine grained analysis because of the sample complexity matter you mention.
Pop frameworks which elaborate on this, and how it may be applied include David Snowden’s Cynefin framework which is geared for government and organizations and of course Nicholas Nassim Taleb’s Incerto.
I seem to recall also that the gist of Dunbar’s Number, and the reason why certain Parrots and Corvids seem to have larger pre-frontal-crotex equivalents than non-monogamous birds, is basically so that they can have a internal model of their mating partner. (This is very interesting to think about in terms of intimate human relationships, what I’d poetically describe as the “telepathy” when wordlessly you communicate, intuit, and predict a wide range of each-other’s complex and specific desires and actions because you’ve spent enough time together).
The scary thought to me is that a superintelligence would quite simply not need to accurately model us, it would just need to fine tune it’s models in a way not dissimilar from the psychographic models utilized by marketers. Of course that operates at scale so the margin of error is much greater but more ‘acceptable’.
Indeed dumb algorithms already to this very well—think about how ‘addictive’ people claim their TikTok or Facebook feeds are. The rudimentary sensationalist clickbait that ensures eyeballs and clicks. A superintelligence doesn’t need accurate modelling—this is without having individual conversations with us, to my knowledge (or rather my experience) most social media algorithms are really bad at taking the information on your profile and using things like sentiment and discourse analysis to make decisions about which content to feed you; they rely on engagement like sharing, clicking like, watch time and rudimentary metrics like that. Similarly, the content creators are often casting a wide net, and using formulas to produce this content.
A superintelligence I wager would not need accuracy yet still be capable of psychological tactics geared to the individual that the Stasi who operated Zersetzung could only dream of. Marketers must be drooling at the possibilities of finding orders of magnitude more effective marketing campaigns that would make one to one sales obsolete.
One can showcase very simple examples of data that is easy to generate ( simple data soirce) yet very hard to predict.
E.g. there is a 2-state generating hidden markov model whose optimal prediction hidden markov model is infinite.
Ive heard it explained as follows: it’s much harder for the fox to predict where the hare is going than it is for the hare to decide where to go to shake off the fox.
I’m starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added—also some people outside of lesswrong are interested.
So is the fascination with applying math to complex real-world problems (like alignment) when the necessary assumptions don’t really fit the real-world problem.
Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.
I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?
Mathematics students are often annoyed that they have to worry about “bizarre or unnatural” counterexamples when proving things. For instance, differentiable functions without continuous derivative are pretty weird. Particularly engineers tend to protest that these things will never occur in practice, because they don’t show up physically. But these adversarial examples show up constantly in the practice of mathematics—when I am trying to prove (or calculate) something difficult, I will try to cram the situation into a shape that fits one of the theorems in my toolbox, and if those tools don’t naturally apply I’ll construct all kinds of bizarre situations along the way while changing perspective. In other words, bizarre adversarial examples are common in intermediate calculations—that’s why you can’t just safely forget about them when proving theorems. Your logic has to be totally sound as a matter of abstraction or interface design—otherwise someone will misuse it.
While I think the reaction against pathological examples can definitely make sense, and in particular there is a bad habit of some people to overfocus on pathological examples, I do think mathematics is quite different from other fields in that you want to prove that a property holds for all objects with a certain property, or prove that there exists an object with a certain property, and in these cases you can’t ignore the pathological examples, because they can provide you with either solutions to your problem, or show why your approach can’t work.
This is why I didn’t exactly like Dalcy’s point 3 here:
https://www.lesswrong.com/posts/GG2NFdgtxxjEssyiE/dalcy-s-shortform#qp2zv9FrkaSdnG6XQ
There is also the reverse case, where it is often common practice in math or logic to ignore bizarre and unnatural counterexamples. For example, first-order Peano arithmetic is often identified with Peano arithmetic in general, even though the first order theory allows the existence of highly “unnatural” numbers which are certainly not natural numbers, which are the subject of Peano arithmetic.
Another example is the power set axiom in set theory. It is usually assumed to imply the existence of the power set of each infinite set. But the axiom only implies that the existence of such power sets is possible, i.e. that they can exist (in some models), not that they exist full stop. In general, non-categorical theories are often tacitly assumed to talk about some intuitive standard model, even though the axioms don’t specify it.
Eliezer talks about both cases in his Highly Advanced Epistemology 101 for Beginners sequence.
Perhaps LLM’s are starting to approach the intelligence of today’s average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)
This doesn’t seem to be reflected in the general opinion here, but it seems to me that LLM’s are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they’re heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account—however, this is outweighed by my personal experience with LLM’s. I just don’t find them useful for practically anything. I have a pretty consistently correct model of the problems they will be able to help me with and it’s not a lot—maybe a broad introduction to a library I’m not familiar with or detecting simple bugs. That model has worked for a year or two without expanding the set much. Also, I don’t see any applications to anything economically productive except for fluffy chatbot apps.
Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to “occasionally helpful, maybe like a 5-10% productivity improvement” to “my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement”.
I’m in Canada so can’t access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I’m not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There’s even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
Just an FYI unrelated to the discussion—all versions of Claude are available in Canada through Anthropic, you don’t even need third party services like Poe anymore.
Source: https://www.anthropic.com/news/introducing-claude-to-canada
Base model scale has only increased maybe 3-5x in the last 2 years, from 2e25 FLOPs (original GPT-4) up to maybe 1e26 FLOPs[1]. So I think to a significant extent the experiment of further scaling hasn’t been run, and the 100K H100s clusters that have just started training new models in the last few months promise another 3-5x increase in scale, to 2e26-6e26 FLOPs.
Right, the metrics don’t quite capture how smart a model is, and the models haven’t been getting much smarter for a while now. But it might be simply because they weren’t scaled much further (compared to original GPT-4) in all this time. We’ll see in the next few months as the labs deploy the models trained on 100K H100s (and whatever systems Google has).
This is 3 months on 30K H100s, $140 million at $2 per H100-hour, which is plausible, but not rumored about specific models. Llama-3-405B is 4e25 FLOPs, but not MoE. Could well be that 6e25 FLOPs is the most anyone trained for with models deployed so far.
I’ve noticed they perform much better on graduate-level ecology/evolution questions (in a qualitative sense—they provide answers that are more ‘full’ as well as technically accurate). I think translating that into a “usefulness” metric is always going to be difficult though.
The last few weeks I felt the opposite of this. I kind of go back and forth on thinking they are plateauing and then I get surprised with the new Sonnet version or o1-preview. I also experiment with my own prompting a lot.
I’ve noticed occasional surprises in that direction, but none of them seem to shake out into utility for me.
Is this a reaction to OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows?
No that seems paywalled, curious though?
I’ve been waiting to say this until OpenAI’s next larger model dropped, but this has now failed to happen for so long that it’s become it’s own update, and I’d like to state my prediction before it becomes obvious.
Sometimes I wonder if people who obsess over the “paradox of free will” are having some “universal human experience” that I am missing out on. It has never seemed intuitively paradoxical to me, and all of the arguments about it seem either obvious or totally alien. Learning more about agency has illuminated some of the structure of decision making for me, but hasn’t really effected this (apparently) fundamental inferential gap. Do some people really have this overwhelming gut feeling of free will that makes it repulsive to accept a lawful universe?
I used to, as a child. I did accept a lawful universe, but I thought my perception of free will was in tension with that, so that perception must be “an illusion”.
My mother kept trying to explain to me that there was no tension between these things, because it was correct that my mind made its own decisions rather than some outside force. I didn’t understand what she was saying though. I thought she was just redefining ‘free will’ from a claim that human brains effectively had a magical ability to spontaneously ignore the laws of physics to a boring tautological claim that human decisions are made by humans rather than something else.
I changed my mind on this as a teenager. I don’t quite remember how, it might have been the sequences or HPMOR again. I realised that my imagination had still been partially conceptualising the “laws of physics” as some sort of outside force, a set of strings pulling my atoms around, rather than as a predictive description of me and the universe. Saying “the laws of physics make my decisions, not me” made about as much sense as saying “my fingers didn’t move, my hand did.” That was what my mother had been trying to tell me.
I don’t think so as I had success explaining away the paradox with concept of “different levels of detail”—saying that free will is a very high-level concept and further observations reveal a lower-level view, calling upon analogy with algorithmic programming’s segment tree.
(Segment tree is a data structure that replaces an array, allowing to modify its values and compute a given function over all array elements efficiently. It is based on tree of nodes, each of those representing a certain subarray; each position is therefore handled by several—specifically, O(logn) nodes.)
This might be related to whether you see yourself as a part of the universe, or as an observer. If you are an observer, the objection is like “if I watch a movie, everything in the movie follows the script, but I am outside the movie, therefore outside the influence of the script”.
If you are religious, I guess your body is a part of the universe (obeys the laws of gravity etc.), but your soul is the impartial observer. Here the religion basically codifies the existing human intuitions.
It might also depend on how much you are aware of the effects of your environment on you. This is a learned skill; for example little kids do not realize that they are hungry… they just get kinda angry without knowing why. It requires some learning to realize “this feeling I have right now—it is hunger, and it will probably go away if I eat something”. And I guess the more knowledge of this kind you accumulate, the easier it is to see yourself as a part of the universe, rather than being outside of it and only moved by “inherently mysterious” forces.
Most ordinary people don’t know that no one understands how neural networks work (or even that modern “Generative A.I.” is based on neural networks). This might be an underrated message since the inferential distance here is surprisingly high.
It’s hard to explain the more sophisticated models that we often use to argue that human dis-empowerment is the default outcome but perhaps much better leveraged to explain these three points:
1) No one knows how A.I models / LLMs / neural nets work (with some explanation of how this is conceptually possible).
2) We don’t know how smart they will get how soon.
3) We can’t control what they’ll do once they’re smarter than us.
At least under my state of knowledge, this is also a particularly honest messaging strategy, because it emphasizes the fundamental ignorance of A.I. researchers.
“Optimization power” is not a scalar multiplying the “objective” vector. There are different types. It’s not enough to say that evolution has had longer to optimize things but humans are now “better” optimizers: Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other—they’re radically different sorts of things.
Evolution optimizes things in a massively parallel way, so that they’re robustly good at lots of different selectively relevant things at once, and has been doing this for a very long time so that inconceivably many tiny lessons are baked in a little bit. Humans work differently—we try to figure out what works for explainable, preferably provable reasons. We also blindly twiddle parameters a bit, but we can only keep so many parameters in mind at once and compare so many metrics—humanity has a larger working memory than individual humans, but the human innovation engine is still driven by linguistic theories, expressed in countable languages. There must be a thousand deep mathematical truths that evolution is already taking advantage of to optimize its DNA repair algorithms, or design wings to work very well under both ordinary and rare turbulent conditions, or minimize/maximize surface tensions of fluids, or invent really excellent neural circuits—without ever actually finding the elaborate proofs. Solving for exact closed form solutions is often incredibly hard, even when the problem can be well-specified, but natural selection doesn’t care. It will find what works locally, regardless of logical depth. It might take humans thousands of years to work some of these details out on paper. But once we’ve worked something out, we can deliberately scale it further and avoid local minima. This distinction in strategies of evolution v.s. humans rhymes with wisdom v.s. intelligence—though in this usage intelligence includes all the insight, except insofar as evolution located and acts through us. As a sidebar, I think some humans prefer an intuitive strategy that is more analogous to evolution’s in effect (but not implementation).
So what about when humans turn to building a mind? Perhaps a mind is by its nature something that needs to be robust, optimized in lots of little nearly inexplicable ways for arcane reasons to deal with edge cases. After all, isn’t a mind exactly that which provides an organism/robot/agent with the ability to adapt flexibly to new situations? A plane might be faster than a bird, throwing more power at the basic aerodynamics, but it is not as flexible—can we scale some basic principles to beat out brains with the raw force of massive energy expenditure? Or is intelligence inherently about flexibility, and impossible to brute force in that way? Certainly it’s not logically inconsistent to imagine that flexibility itself has a simple underlying rule—as a potential existence proof, the mechanics of evolutionary selection are at least superficially simple, though we can’t literally replicate it without a fast world-simulator, which would be rather complicated. And maybe evolution is not a flexible thing, but only a designer of flexible things. So neither conclusion seems like a clear winner a priori.
The empirical answers so far seem to complicate the story. Attempts to build a “glass box” intelligence out of pure math (logic or probability) have so far not succeeded, though they have provided useful tools and techniques (like statistics) that avoid the fallacies and biases of human minds. But we’ve built a simple outer loop optimization target called “next token prediction” and thrown raw compute at it, and managed to optimize black box “minds” in a new way (called gradient descent by backpropogation). Perhaps the process we’ve capture is a little more like evolution, designing lots of little tricks that work for inscrutable reasons. And perhaps it will work, woe unto us, who have understood almost nothing from it!
Gary Kasparov would beat me at chess in some way I can’t predict in advance. However, if the game starts with half his pieces removed from the board, I will beat him by playing very carefully. The first above-human level A.G.I. seems overwhelmingly likely to be down a lot of material—massively outnumbered, running on our infrastructure, starting with access to pretty crap/low bandwidth actuators in the physical world and no legal protections (yes, this actually matters when you’re not as smart as ALL of humanity—it’s a disadvantage relative to even the average human). If we exercise even a modicum of competence it will also be even tougher (e.g. an air gap, dedicated slightly weaker controllers, exposed thoughts at some granularity). If the chess metaphor holds we should expect the first such A.G.I. not to beat us—but it may well attempt to escape under many incentive structures. Does this mean we should expect to have many tries to solve alignment?
If you think not, it’s probably because of some dis-analogy with chess. For instance, the search space in the real world is much richer, and maybe there are always some “killer moves” available if you’re smart enough to see them e.g. invent nanotech. This seems to tie in with people’s intuitions about A) how fragile the world is and B) how g-loaded the game of life is. Personally I’m highly uncertain about both, but I suspect the answers are “somewhat.”
I would guess that A.G.I. that only wants to end the world might be able to pull it off with slightly superhuman intelligence, which is very scary to me. But I think it would actually be very hard to bootstrap all singularity level infrastructure from a post-apocalyptic wasteland, so perhaps this is actually not an convergent instrumental subgoal at this level of intelligence.
Is life actually much more g-loaded than chess? In terms of how far you can in principle multiply your material, unequivocally yes. However life is also more stochastic—I will never beat Gary Kasparov in a fair game, but if Jeff Bezos and I started over with ~0 dollars and no name recognition / average connections today, I think there’s a good >1% chance I’m richer in a year. It’s not immediately clear to me which view is more relevant here.
The primary optimization target for LLM companies/engineers seems to be making them seem smart to humans, particularly the nerds who seem prone to using them frequently. A lot of money and talent is being spent on this. It seems reasonable to expect that they are less smart than they seem to you, particularly if you are in the target category. This is a type of Goodharting.
In fact, I am beginning to suspect that they aren’t really good for anything except seeming smart, and most rationalists have totally fallen for it, for example Zvi insisting that anyone who is not using LLMs to multiply their productivity is not serious (this is a vibe not a direct quote but I think it’s a fair representation of his writing over the last year). If I had to guess, LLMs have 0.99x’ed my productivity by occasionally convincing me to try to use them which is not quite paid for by very rarely fixing a bug in my code. The number is close to 1x because I don’t use them much, not because they’re almost useful. Lots of other people seem to have much worse ratios because LLMs act as a superstimulus for them (not primarily a productivity tool).
Certainly this is an impressive technology, surprising for its time, and probably more generally intelligent than anything else we have built—not going to get into it here, but my model is that intelligence is not totally “atomic” but has various pieces, some of which are present and some missing in LLMs. But maybe the impressiveness is not a symptom of intelligence, but the intelligence a symptom of impressiveness—and if so, it’s fair to say that we have (to varying degrees) been tricked.
I expect he’d disagree, for example I vaguely recall him mentioning that LLMs are not useful in a productivity-changing way for his own work. And 10x specifically seems clearly too high for most things even where LLMs are very useful, other bottlenecks will dominate before that happens.
10x was probably too strong but his posts are very clear he things it’s a large productivity multiplier. I’ll try to remember to link the next instance I see.
Found the following in the Jan 23 newsletter:
I suspect that human minds are vast (more like little worlds of our own than clockwork baubles) and even a superintelligence would have trouble predicting our outputs accurately from (even quite) a few conversations (without direct microscopic access) as a matter of sample complexity.
Considering the standard rhetoric about boxed A.I.’s, this might have belonged in my list of heresies: https://www.lesswrong.com/posts/kzqZ5FJLfrpasiWNt/heresies-in-the-shadow-of-the-sequences
There is a large body of non-AI literature that already addresses this, for example the research of Gerd Gigerenzer which shows that often heuristics and “fast and frugal” decision trees substantially outperform fine grained analysis because of the sample complexity matter you mention.
Pop frameworks which elaborate on this, and how it may be applied include David Snowden’s Cynefin framework which is geared for government and organizations and of course Nicholas Nassim Taleb’s Incerto.
I seem to recall also that the gist of Dunbar’s Number, and the reason why certain Parrots and Corvids seem to have larger pre-frontal-crotex equivalents than non-monogamous birds, is basically so that they can have a internal model of their mating partner. (This is very interesting to think about in terms of intimate human relationships, what I’d poetically describe as the “telepathy” when wordlessly you communicate, intuit, and predict a wide range of each-other’s complex and specific desires and actions because you’ve spent enough time together).
The scary thought to me is that a superintelligence would quite simply not need to accurately model us, it would just need to fine tune it’s models in a way not dissimilar from the psychographic models utilized by marketers. Of course that operates at scale so the margin of error is much greater but more ‘acceptable’.
Indeed dumb algorithms already to this very well—think about how ‘addictive’ people claim their TikTok or Facebook feeds are. The rudimentary sensationalist clickbait that ensures eyeballs and clicks. A superintelligence doesn’t need accurate modelling—this is without having individual conversations with us, to my knowledge (or rather my experience) most social media algorithms are really bad at taking the information on your profile and using things like sentiment and discourse analysis to make decisions about which content to feed you; they rely on engagement like sharing, clicking like, watch time and rudimentary metrics like that. Similarly, the content creators are often casting a wide net, and using formulas to produce this content.
A superintelligence I wager would not need accuracy yet still be capable of psychological tactics geared to the individual that the Stasi who operated Zersetzung could only dream of. Marketers must be drooling at the possibilities of finding orders of magnitude more effective marketing campaigns that would make one to one sales obsolete.
One can showcase very simple examples of data that is easy to generate ( simple data soirce) yet very hard to predict.
E.g. there is a 2-state generating hidden markov model whose optimal prediction hidden markov model is infinite.
Ive heard it explained as follows: it’s much harder for the fox to predict where the hare is going than it is for the hare to decide where to go to shake off the fox.
I’m starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added—also some people outside of lesswrong are interested.
Presented the Sherlockian abduction master list at a Socratica node:
A “Christmas edition” of the new book on AIXI is freely available in pdf form at http://www.hutter1.net/publ/uaibook2.pdf
Over-fascination with beautiful mathematical notation is idol worship.
So is the fascination with applying math to complex real-world problems (like alignment) when the necessary assumptions don’t really fit the real-world problem.
(Not “idle worship”?)
Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.
I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?