moridinamael
Thanks, I’ll keep that in mind!
A couple of things:
TPUs are already effectively leaping above the GPU trend in price-performance. It is difficult to find an exact cost for a TPU because they are not sold retail, but my own low-confidence estimates for the price of a TPU v5e place its price-performance significantly above the GPU given in the plot. I would expect that the front runner in price-performance cease to be what we think of as GPUs and thus intrinsic architectural limitations of GPUs cease to be the critical bottleneck.
Expecting price-performance to improve doesn’t mean we necessarily expect hardware to improve, just that we become more efficient at making hardware. Economies of scale and refinements in manufacturing technology can dramatically improve price-performance by reducing manufacturing costs, without any improvement in the underlying hardware. Of course, in reality we expect both the hardware to become faster and the price of manufacturing it to fall. This is even more true as the sheer quantity of money being poured into compute manufacturing goes parabolic.
The graph was showing up fine before, but seems to be missing now. Perhaps it will come back. The equation is simply an eyeballed curve fit to Kurzweil’s own curve. I tried pretty hard to convey that the 1000x number is approximate:
> Using the super-exponential extrapolation projects something closer to 1000x improvement in price-performance. Take these numbers as rough, since the extrapolations depend very much on the minutiae of how you do your curve fit. Regardless of the details, it is a difference of orders of magnitude.The justification for putting the 1000x number in the post instead of precisely calculating a number from the curve fit is that the actual trend is pretty wobbly over the years, and my aim here is not to pretend at precision. If you just look at the plot, it looks like we should expect “about 3 orders of magnitude” which really is the limit of the precision level that I would be comfortable with stating. I would guess not lower than two orders of magnitude. Certainly not as low as one order of magnitude, as would be implied by the exponential extrapolation, and would require that we don’t have any breakthroughs or new paradigms at all.
Super-Exponential versus Exponential Growth in Compute Price-Performance
GPT4 confirms for me that the Meissner effect does not require flux pinning: “Yes, indeed, you’re correct. Flux pinning, also known as quantum locking or quantum levitation, is a slightly different phenomenon from the pure Meissner effect and can play a crucial role in the interaction between a magnet and a superconductor.
In the Meissner effect, a superconductor will expel all magnetic fields, creating a repulsive effect. However, in type-II superconductors, there are exceptions where some magnetic flux can penetrate the material in the form of tiny magnetic vortices. These vortices can become “pinned” in place due to imperfections in the superconductor’s structure.
This flux pinning is the basis of quantum locking, where the superconductor is ‘locked’ in space relative to the magnetic field. This can create the illusion of levitation in any orientation, depending on how the flux was pinned. For instance, a superconductor could be pinned in place above a magnet, below a magnet, or at an angle.
So, yes, it is indeed important to consider flux pinning when discussing the behavior of superconductors in a magnetic field. Thanks for pointing out this nuance!”
I think Sabine is just not used to seeing small pieces of superconductor floating over large magnets. Every Meissner effect video that I can find shows the reverse: small magnets floating on top of pieces of cooled superconductor. This makes sense because it is hard to cool something that is floating in the air.
I suspect that if somebody had given me this advice when I was a student I would have disregarded it, but, well, this is why wisdom is notoriously impossible to communicate. Wisdom always either sounds glib, banal or irrelevant. Oh well:
Anxiety, aversion and stress diminish with exposure and repetition.
This is something that, the older I get, the more I wish I had had this tattooed onto my body as a teenager. This is true of not only doing the dishes and laundry, but also vigorous exercise, talking to strangers, changing baby diapers, public speaking in front of crowds, having difficult conversations, and tackling unfamiliar subject matters. All of these are things that always suck, for everyone, the first time, or the first several times. I used to distinctly hate doing all of these things, and to experience a strong aversion to doing them, and to avoid doing them until circumstances forced me. Now they are all things I don’t mind doing at all.
There may be “tricks” for metabolizing the anxiety of something like public speaking, but you ultimately don’t need tricks. You just need to keep doing the thing until you get used to it. One day you wake up and realize that it’s no longer a big deal.
What you really wanted from this answer was something that you could do today to help with your anxiety. The answer, then, is that if you really believe the (true) claim that simply doing the reps will make the anxiety go away, then the meta-anxiety you’re feeling now (which is in some sense anxiety about future anxiety) will go away.
The Party Problem is a classic example taught as an introductory case in decision theory classes, that was the main reason why I chose it.
Here’s are a couple of examples of our decision theory workshops:
https://guildoftherose.org/workshops/decision-making
https://guildoftherose.org/workshops/applied-decision-theory-1
There are about 10 of them so far covering a variety of topics related to decision theory and probability theory.
Great points. I would only add that I’m not sure the “atomic” propositions even exist. The act of breaking a real-world scenario into its “atomic” bits requires magic, meaning in this case a precise truncation of intuited-to-be-irrelevant elements.
Good point. You could also say that even having the intuition for which problems are worth the effort and opportunity cost of building decision trees, versus just “going with what feels best”, is another bit of magic.
I probably should have listened to the initial feedback on this post along the lines that it wasn’t entirely clear what I actually meant by “magic” and was possibly more confusing than illuminating, but, oh well. I think that GPT-4 is magic in the same way that the human decision-making process is magic: both processes are opaque, we don’t really understand how they work at a granular level, and we can’t replicate them except in the most narrow circumstances.
One weakness of GPT-4 is it can’t really explain why it made the choices it did. It can give plausible reasons why those choices were made, but it doesn’t have the kind of insight into its motives that we do.
Short answer, yes, it means deferring to a black-box.
Longer answer, we don’t really understand what we’re doing when we do the magic steps, and nobody has succeeded in creating an algorithm to do the magic steps reliably. They are all open problems, yet humans do them so easily that it’s difficult for us to believe that they’re hard. The situation reminds me back when people thought that object recognition from images ought to be easy to do algorithmically, because we do it so quickly and effortlessly.
Maybe I’m misunderstanding your specific point, but the operations of “listing possible worlds” and “assigning utility to each possible world” are simultaneously “standard” in the sense that they are basic primitives of decision theory and “magic” in the sense that we haven’t had any kind of algorithmic system that was remotely capable of doing these tasks until GPT-3 or −4.
I spent way too many years metaphorically glancing around the room, certain that I must be missing something that is obvious to everyone else. I wish somebody had told me that I wasn’t missing anything, and these conceptual blank spots are very real and very important.
As for the latter bit, I am not really an Alignment Guy. The taxonomy I offer is very incomplete. I do think that the idea of framing the Alignment landscape in terms of “how does it help build a good decision tree? what part of that process does it address or solve?” has some potential.
Decision Theory with the Magic Parts Highlighted
So do we call it in favor of porby, or wait a bit longer for the ambiguity over whether we’ve truly crossed the AGI threshold to resolve?
That is probably close to what they would suggest if this weren’t mainly just a metaphor for the weird ways that I’ve seen people thinking about AI timelines.
It might be a bit more complex than a simple weighted average because of discounting, but that would be the basic shape of the proper hedge.
These would be good ideas. I would remark that many people definitely do not understand what is happening when naively aggregating, or averaging together disparate distributions. Consider the simple example of the several Metaculus predictions for date of AGI, or any other future event. Consider the way that people tend to speak of the aggregated median dates. I would hazard most people using Metaculus, or referencing the bio-anchors paper, think the way the King does, and believe that the computed median dates are a good reflection of when things will probably happen.
Generally, you should hedge. Devote some resources toward planting and some resources toward drought preparedness, allocated according to your expectation. In the story, the King trust the advisors equally, and should allocate toward each possibility equally, plus or minus some discounting. Just don’t devote resources toward the fake “middle of the road” scenario that nobody actually expects.
If you are in a situation where you really can only do one thing or the other, with no capability to hedge, then I suppose it would depend on the details of the situation, but it would probably be best to “prepare now!” as you say.
As I remarked in other comments on this post, this is a plot of price-performance. The denominator is price, which can become cheap very fast. Potentially, as the demand for AI inference ramps up over the coming decade, the price of chips falls fast enough to drive this curve without chip speed growing nearly as fast. It is primarily an economic argument, not a purely technological argument.
For the purposes of forecasting, and understanding what the coming decade will look like, I think we care more about price-performance than raw chip speed. This is particularly true in a regime where both training and inference of large models benefit from massive parallelism. This means you can scale by buying new chips, and from a business or consumer perspective you benefit if those chips get cheaper and/or if they get faster at the same price.