Tyler Cowen’s challenge to develop an ‘actual mathematical model’ for AI X-Risk
On the Russ Roberts ECONTALK Podcast #893, guest Tyler Cowen challenges Eliezer Yudkowsky and the Less Wrong/EA Alignment communities to develop a mathematical model for AI X-Risk.
Will Tyler Cowen agree that an ‘actual mathematical model’ for AI X-Risk has been developed by October 15, 2023?
https://manifold.markets/JoeBrenton/will-tyler-cowen-agree-that-an-actu?r=Sm9lQnJlbnRvbg
(This market resolves to “YES” if Tyler Cowen publicly acknowledges, by October 15 2023, that an actual mathematical model of AI X-Risk has been developed.)
Two excerpts from the conversation:
https://youtube.com/clip/Ugkxtf8ZD3FSvs8TAM2lhqlWvRh7xo7bISkp
...But, I mean, here would be my initial response to Eliezer. I’ve been inviting people who share his view simply to join the discourse. So, they have the sense, ‘Oh, we’ve been writing up these concerns for 20 years and no one listens to us.’ My view is quite different. I put out a call and asked a lot of people I know, well-informed people, ‘Is there any actual mathematical model of this process of how the world is supposed to end?’
So, if you look, say, at COVID or climate change fears, in both cases, there are many models you can look at, including—and then models with data. I’m not saying you have to like those models. But the point is: there’s something you look at and then you make up your mind whether or not you like those models; and then they’re tested against data...
https://youtube.com/clip/Ugkx4msoNRn5ryBWhrIZS-oQml8NpStT_FEU
...So, when it comes to AGI and existential risk, it turns out as best I can ascertain, in the 20 years or so we’ve been talking about this seriously, there isn’t a single model done. Period. Flat out.
So, I don’t think any idea should be dismissed. I’ve just been inviting those individuals to actually join the discourse of science. ‘Show us your models. Let us see their assumptions and let’s talk about those.’...
Related:
Will there be a funding commitment of at least $1 billion in 2023 to a program for mitigating AI risk?
https://manifold.markets/JoeBrenton/will-there-be-a-funding-commitment?r=Sm9lQnJlbnRvbg
Will the US government launch an effort in 2023 to augment human intelligence biologically in response to AI risk?
https://manifold.markets/JoeBrenton/will-the-us-government-launch-an-ef?r=Sm9lQnJlbnRvbg
https://manifold.markets/JoeBrenton/will-the-general-public-in-the-unit?r=Sm9lQnJlbnRvbg
Will the general public in the United States become deeply concerned by LLM-facilitated scams by Aug 2 2023?
https://manifold.markets/JoeBrenton/will-the-general-public-in-the-unit?r=Sm9lQnJlbnRvbg
If you go back 10 million years and ask for an “actual mathematical model” supporting a claim that descendants of chimpanzees may pose an existential threat to the descendants of ground sloths (for example)—a model that can then be “tested against data”—man, I would just have no idea how to do that.
Like, chimpanzees aren’t even living on the same continent as ground sloths! And a ground sloth could crush a chimpanzee in a fight anyway! It’s not like there’s some trendline where the chimpanzee-descendants are gradually killing more and more ground sloths, and we can extrapolate it out. Instead you have to start making up somewhat-speculative stories (“what if the chimp-descendants invent a thing called weapons!?”). And then it’s not really a “mathematical model” anymore, or at least I don’t think it’s the kind of mathematical model that Tyler Cowen is hoping for.
One mathematical model that seems like it would be particularly valuable to have here is a model of the shapes of the resources invested vs optimization power curve. The reason I think an explicit model would be valuable there is that a lot of the AI risk discussion centers around recursive self-improvement. For example, instrumental convergence / orthogonality thesis / pivotal acts are relevant mostly in contexts where we expect a single agent to become more powerful than everyone else combined. (I am aware that there are other types of risk associated with AI, like “better AI tools will allow for worse outcomes from malicious humans / accidents”. Those are outside the scope of the particular model I’m discussing).
To expand on what I mean by this, let’s consider a couple of examples of recursive self-improvement.
For the first example, let’s consider the game of Factorio. Let’s specifically consider the “mine coal + iron ore + stone / smelt iron / make miners and smelters” loop. Each miner produces some raw materials, and those raw materials can be used to craft more miners. This feedback loop is extremely rapid, and once that cycle gets started the number of miners placed grows exponentially until all available ore patches are covered with miners.
For our second example, let’s consider the case of an optimizing compiler like gcc. A compiler takes some code, and turns it into an executable. An optimizing compiler does the same thing, but also checks if there are any ways for it to output an executable that does the same thing, but more efficiently. Some of the optimization steps will give better results in expectation the more resources you allocate to them, at the cost of (sometimes enormously) greater required time and memory for the optimization step, and as such optimizing compilers like gcc have a number of flags that let you specify exactly how hard it should try.
Let’s consider the following program:
This is also a thing which will recursively self-improve, in the technical sense of “the result of each iteration will, in expectation, be better than the result of the previous iteration, and the improvements it finds help it more efficiently find future improvements”. However, it seems pretty obvious that this “recursive self-improver” will not do the kind of exponential takeoff we care about.
The difference between these two cases comes down to the shapes of the curves. So one area of mathematical modeling I think would be pretty valuable would be
Figure out what shapes of curves lead to gaining orders of magnitude more capabilities in a short period of time, given constant hardware
The same question, but given the ability to rent or buy more hardware
The same question, but now it invest in improving chip fabs, with the same increase in investment required for each improvement as we have previously observed for chip fabs
What do the empirical scaling laws for deep learning look like? Do they look like they come in under the curves from 1-3? What if we look at the change in the best scaling laws over time—where does that line point?
Check whether your model now says that we should have been eaten by a recursively self improving AI in 1982. If it says that, the model may require additional work.
I will throw in an additional $300 bounty for an explicit model of this specific question, subject to the usual caveats (payable to only one person, can’t be in a sanctioned country, etc), because I personally would like to know.
Edit: Apparently Tyler Cowen didn’t actually bounty this. My $300 bounty offer stands but you will not be getting additional money from Tyler it looks like.
If we can model the spread of a virus, why can’t we model a superintelligence? A brilliant question indeed.
I built a preliminary model here: https://colab.research.google.com/drive/108YuOmrf18nQTOQksV30vch6HNPivvX3?authuser=2
It’s definitely too simple to treat as strong evidence, but it shows some interesting dynamics. For example, levels of alignment rise at first, then rapidly falling when AI deception skills exceed human oversight capacity. I sent it to Tyler and he agreed — cool, but not actual evidence.
If anyone wants to work on improving this, feel free to reach out!