Can we model technological singularity as the phase transition?
Introduction
Technological singularity is quite similar to what happens with the system near the phase transition. If it is indeed the case and the underlying mechanisms behind singularity allow the same form of the mathematical description as underlying mechanisms of the phase transition, we can potentially use this knowledge to estimate when should we expect singularity.
I wrote this post in the following way. First, I remind those who are far from physics what is the phase transition. Second, I discuss why it resembles singularity. Third, I suggest how we can make a quantitative prediction based on it. Fourth, I finally tell you why all this is important.
What is the phase transition
The phase transition is the transition from one phase to another. (duh!). The simplest example of the phase transition would be a solid-liquid or liquid-gas transition. A little bit more complicated is the transition from ferromagnetic to paramagnetic (Curie point). The transition from one phase to another happens at a set of critical parameters. For simplicity, we will just talk about the temperature here (like the temperature of boiling or of freezing). When temperature approaches the critical temperature of the transition from one phase to another many quantities (susceptibility to the magnetic field, for example) demonstrate a power-law behavior, i.e. the quantity depends on the temperature as
where controls this power law behavior and is called the critical exponent. If , the quantity diverges, i.e., approaches infinity, when approaches . It does not mean the quantity actually becomes infinite—since the size of the system is finite, it is impossible—but close to phase transition such growth is a very good description. One of such quantities, common for practically all of the systems, is correlation length, which basically tells, how far from each other in the media (bucket of water, or magnet, for example) can be two points that still influence each other in a noticeable way. This quantity always diverges near the phase transition, i.e, basically any two points become interdependent (again, limited by the size of the system). The distance between two points does not matter anymore in this case.
Does it look like technological singularity?
First of all, of course, we don’t have a temperature. Instead, we have time $t$ that approaches the moment of singularity . As approaches , the following quantities should explode:
-knowledge about the world (call it information, or whatever)
-technology level (to make it more distinct from knowledge—let’s say that it is an extensive quantity, so producing copies of existing devices increases it)
-interdependence between remote agents (people separated geographically).
Of course, it does not mean that any of these quantities will actually reach infinity—the same as for the standard phase transition, we have finite-size effects.
The interdependence between people is an obvious analogy to correlation length, which is common for all systems. Other quantities are specific for the system at hand (humanity).
The qualitative analogy kind of makes sense (well, at least to me). Let us see what can be done to make it quantitative rather than qualitative.
Suggestions for stricter analysis
The way to make this statement more quantitative would be to construct a measurable quantity corresponding to knowledge, technology, and interdependence. This is not that trivial. Should we just look at all the information on the Internet? Or all papers published? Or all code written? Or, maybe we should look at how much information we can produce and look at the computation power? Or maybe some combination? Since we have a finite amount of data for each of those parameters, and we can make up even more of them, we can easily get a power law simply by cherry-picking. Finally, even if we are lucky and get a nice power-law behavior for a single very reasonable parameter, it is not guaranteed that this parameter will not be saturated soon, so the power law will be terminated.
To avoid such mistakes, let us imagine the following situation. Our ancestors performed the procedure described above many years (or centuries) ago. They predicted something, based on it. Now we see, how wrong they were.
So, if someone would have looked for the power-law before the 1960s, assuming they have the necessary data, they would likely found the power law that is very well known today—the hyperbolic growth of the Earth’s population. Before the era of computers, the number of people can be a good estimation of computation power, so in some sense derivative of the information (more computation power—more information gain per unit time). Then our ancient singularity scientist could make a prediction when the singularity is supposed to happen, based on it, obtaining that it will happen in 2026. This was actually done here.
The growth of the population stopped being hyperbolic in 1960-1970, however, the structure of information gain changed. For example, in the same 1960-1970 computers became more popular. The number of people is a bad measure of computation power since then.
However, how wrong was the date prediction? The current predictions for the singularity date vary from as early as 2030 to as late as 2100. Since the prediction was made in 1960, it means that it suggested the singularity in 66 years from now (“now” of prediction) while correct would be something in between 70 and 140. Even in the latter case, it is just slightly more than twice off, and in the earliest, the prediction is almost exact.
This history lesson teaches us two things. First, very likely the quantity we suggested will stop experience power-law behavior before the singularity. Second, it is quite possible that we still get the right order of magnitude for the date (at least, such prediction might be more reasonable and data-based than current predictions based on intuition of participants).
So, the question is now, what should be our quantities?
What quantities would work for singularity?
The first idea, based on what is discussed above, is total computation power. It makes sense to try add both humans and computers. Obviously, the computation power of the brain is way more than that of the usual PC, but most of it does not lead to knowledge increment. Thus, it might be that on average one person will be equivalent to a smartphone, and the coefficient of proportionality will be determined from the data fit. It is the first thing to check. If it works, cool. If no, maybe we should think harder. Maybe we do not need to take into account all computation power but only that that is used for scientific projects? How much computer power is going to it, and how much human hours is going to it? Maybe something else?
I think these ideas are at least worth checking. Unfortunately, my googling abilities are sufficiently below average, so I was not able to retrieve the data I needed to check the idea. Here I will need your help.
Why is it worth doing? First, I personally think it is a very nice scientific problem, and if singularity indeed can be described as the phase transition, it is beautiful.
Second, and the most important. If we predict the time of singularity more exactly, we can prepare better for it. Imagine the data fit clearly shows you that singularity will happen in 6 years. Then it suddenly becomes more important than any global problem that bothers humanity now. Even more important than many of my personal problems. If it is the case, the thing I will be most sorry for in my life is that I procrastinated writing this text for half of a year.
Compute: A very simple attempt at estimating (non-biological) computing power:
A version of Moore’s law applies to the cost of computation power. And Moore’s law held true quite steadily, so we can assume an exponential growth,
The figure in this article on the growth of the semiconductor industry shows significant oscillations in the growth rate in the last 25 years, but seems totally compatible with exponential growth plus a lot of noise.
If we just naively combine these two, we get two multiplying exponential growths for the total semiconductor compute over time, which again gives us exponential growth. This can be considered explosive in its own way but it is not a singularity.
Data storage: A quick search for global data storage lead me to this whitepaper where Fig. 1 on page 6 shows the “Annual Size of the Global Datasphere” and looks as if they just directly plotted an exponential function and did not directly use yearly empirical data. I am not sure whether the true data was just this close to an exponential or whether they did just decide to directly plot an exponential. An both cases, this speaks against super-exponential growth.
General remarks: I have the feeling that in technological singularity there are so many moving parts that even if we found a quantity which consistently shows power-law growth, we would still have a large uncertainty about the timing of singularity. For example, I assume that the transition from DNA as basically the only storage medium to human culture with written text was a significant game-changer. But the timing of this transition could have easily been different by a few centuries (certainly) or millennia (probably). Similarly I would expect that the first general artificial intelligence will be really relevant for a singularity, while it could easily be delayed by a few years /decades even if we know an expected date for a phase transition.
Possibly this is similar to supercooling of water, where it is the presence of ample condensation points which allows us to get the usual freezing temperature. If the liquid water is already beyond the standard freezing point, the first nucleus will decide whether the whole thing freezes.
Also, I could imagine “the amount of information that is usefully integrated in a decision-making agent” to be a quantity that is quite substrate-independent (DNA, people, humanity, AI models) so that one could try to find a trend over long time scales. But I expect that it will be super hard to define “amount of information” sufficiently well. At least “parameter counts in machine learning” (if we simply assume that a model will
notintegrate roughly as much data as it has parameters) does seem compatible with super-exponential growth.Thank you for your research! First of all, I don’t expect the non-human parameter to give a clear power-law, since we need to add humans as well. Of course, close to singularity the impact of humans will be very small, but maybe we are not that close yet. Now for the details:
Compute:
1. Yes, Moore’s law was a quite steady exponential for quite a while, but we indeed should multiply it.
2. The graph shows just a five years period, and not the number of chips produced, but revenue. The five years period is too small for any conclusions, and I am not sure that fluctuations in revenue are not driven mainly by market price rather than by produced amount.
Data storage:
Yes, I saw that one before, seems more like they just draw a nice picture rather than real data.
General remarks:
I agree with the point that AGI appearance can be sufficiently random. I can see two mechanisms that potentially may make it less random. First, we may need a lot of computational resources, data storage etc. to create it, and as a lab or company reaches the threshold, it happens easily with already existing algorithms. Second, we may need a lot of digitalized data to train AGI, so the transition again happens only as we have that much data.
Lastly, notice that cthe reation of AGI is not a singularity in a mathematical sense yet. It will certainly accelerate our progress, but not to infinity, so if the data will predict for example singularity in 2030, it will likely mean AGI earlier than that.
How trustworthy would this prediction be? Depends on the amount of data and noise. If we have just 10-20 datapoints scattered all around the graph, so you can connect the dots in any way you like—not really. If, instead, we are lucky and the control parameter happened to be something easily measurable (something such that you can get just-in-time statistics, like the number of papers on arXiv right now, so we can get really a lot of data points) and the parameter continues to change as theory predicts—it would be a quite strong argument for the timeline.
It is not very likely that the control parameter will be that easily measurable and will obey power-law that good. I think it is a very high risk—very high gain project (very high gain, because if the prediction will be very clear it will be possible to persuade more people that the problem is important).
Trends of different quantities:
Generally, I agree with your points :)
I recently stumbled upon this paper “The World’s Technological Capacity to Store, Communicate, and Compute Information”, which has some neat overviews regarding data storage, broadcasting and compute trends:
From a quick look at the Figures my impression is that compute and storage look very much like ‘just’ exponential, while there is a super-exponential figure (Fig. 4) for the total communication bandwidth (1986-2007)[1]
General
That makes sense. Now that I think about this, I could well imagine that something like “scale is all you need” is sufficiently true that randomness doesn’t shift the expected date by a large amount.
Good point! I think that the time span around and before the first AGI will be most relevant to us as it probably provides the largest possibility to steer the outcome to something good, but this indeed is not the date we found get in a power-law singularity.
This feels quite related to the discussion around biological anchors for estimating the necessary compute for transformative AI and the conclusions one can draw from this. I feel that if one thinks that these are informative, even ‘just’ the exponential compute trends provide rather strong bounds (at least compared to having biological time-scales or such as reference).
Regarding persuading people: I am not sure whether such a trend would make such a large psychological difference compared to the things that we already have: All Possible Views About Humanity’s Future Are Wild. But it would still be a noteworthy finding in any case
Quick hand-wavy estimate whether the trend continued in the last 15 years: If I just assume ‘trend continues’ to mean ‘doubling time halves every 7 years with a factor x40 from 2000 to 2007’ (this isn’t power law, but much easier for me to think about and hopefully close enough in this parameter range) we’d have to find an increase in global bandwidth by (a factor of x40 is roughly 1.5 OOM and we get this factor twice in the first 7 years and 4 times in the second step which makes) roughly 9 orders of magnitude from 2007 to 2021. I did not try to find current numbers, but 9 OOM sound unlikely to me. At least my data usage did increase by maybe 3 to 5 OOM, but 9 OOM just seems too high. Thus, I anecdotally conclude that this trend very probably slowed down in the last 15 years compared to ‘doubling time halves every 7 years’. But to be fair, I would not have correctly predicted the 1986-2007 trend either—so this shouldn’t be taken too seriously.