Regarding your “Redirecting civilization” approach: I wonder about the competitiveness of this. It seems that we will likely build x-risk-causing AI before we have a good enough model to be able to e.g. simulate the world 1000 years into the future on an alternative timeline? Of course, competitiveness is an issue in general, but the more factored cognition or IDA based approaches seem more realistic to me.
Alternatively, we can try to be clever and “import” research from the future repeatedly. For instance we can first ask our model to produce research from 5 years out. Then, we can condition our model on that research existing today, and again ask it for research 5 years out. The problem with this approach is that conditioning on future research suddenly appearing today almost guarantees that there is a powerful AGI involved, which could well be deceptive, and that again is very bad.
I wonder whether there might also be an issue with recursion here. In this approach, we condition on research existing today. In the IDA approach, we train the model to output such research directly. Potentially the latter can be seen as a variant of conditioning if we train with a KL-divergence penalty. In the latter approach, we are worried about fixed-point and superrationality-based nonmyopia issues. I wonder whether something like this concern would also apply to the former approach. Also, now I’m confused about whether the same issue also arises in the normal use-case as a token-by-token simulator, or whether there are some qualitative differences between these cases.
Regarding your “Redirecting civilization” approach: I wonder about the competitiveness of this. It seems that we will likely build x-risk-causing AI before we have a good enough model to be able to e.g. simulate the world 1000 years into the future on an alternative timeline?
I’m not sure. My sense is that generative models have a huge lead in terms of general capabilities over ~everything else, and that seems to be where the most effort is going today. So unless something changes there I expect generative models to be the state of the art when we hit x-risk territory.
That said, it’s totally possible that the x-risk-causing generative model happens before the model that can simulate thousands of years of history. I’m not confident in this either way.
One thing that gives me hope in favor of simulating long histories is that to some extent it’s “just” a matter of more compute, and if we get promising results simulating short spans of history it might not be hard to justify a lot of spending on simulating longer stretches. And there’s a bright spot there too: simulating longer times likely scales sub-linearly with amount of history simulated. If you have a dynamics model then simulating for twice as long costs double the compute. If you’ve got a more clever model that knows how to take shortcuts/compress the dynamics you can probably do better.
I wonder whether something like this concern would also apply to the former approach.
I’m pretty concerned about this. I said a bit about this in the “No Fixed Points” section, but basically I think you have to do something to avoid fixed points, otherwise you get all sorts of world-ending optimization pressures. If you do that, you’re not allowed any recursion where the model simulates itself, and then you get stuck with the problem of how to introduce future research into the past without making a malicious AGI the most likely explanation...
Great post!
Regarding your “Redirecting civilization” approach: I wonder about the competitiveness of this. It seems that we will likely build x-risk-causing AI before we have a good enough model to be able to e.g. simulate the world 1000 years into the future on an alternative timeline? Of course, competitiveness is an issue in general, but the more factored cognition or IDA based approaches seem more realistic to me.
I wonder whether there might also be an issue with recursion here. In this approach, we condition on research existing today. In the IDA approach, we train the model to output such research directly. Potentially the latter can be seen as a variant of conditioning if we train with a KL-divergence penalty. In the latter approach, we are worried about fixed-point and superrationality-based nonmyopia issues. I wonder whether something like this concern would also apply to the former approach. Also, now I’m confused about whether the same issue also arises in the normal use-case as a token-by-token simulator, or whether there are some qualitative differences between these cases.
Thanks!
I’m not sure. My sense is that generative models have a huge lead in terms of general capabilities over ~everything else, and that seems to be where the most effort is going today. So unless something changes there I expect generative models to be the state of the art when we hit x-risk territory.
That said, it’s totally possible that the x-risk-causing generative model happens before the model that can simulate thousands of years of history. I’m not confident in this either way.
One thing that gives me hope in favor of simulating long histories is that to some extent it’s “just” a matter of more compute, and if we get promising results simulating short spans of history it might not be hard to justify a lot of spending on simulating longer stretches. And there’s a bright spot there too: simulating longer times likely scales sub-linearly with amount of history simulated. If you have a dynamics model then simulating for twice as long costs double the compute. If you’ve got a more clever model that knows how to take shortcuts/compress the dynamics you can probably do better.
I’m pretty concerned about this. I said a bit about this in the “No Fixed Points” section, but basically I think you have to do something to avoid fixed points, otherwise you get all sorts of world-ending optimization pressures. If you do that, you’re not allowed any recursion where the model simulates itself, and then you get stuck with the problem of how to introduce future research into the past without making a malicious AGI the most likely explanation...