For one, advocating policies to pause or slow down capabilities development until we have sufficient theoretical understanding to not need to risk misaligned AI causing harm.
But I realise we’re talking at cross purposes. This is about an approach or a concept (not a policy, as I emphasized at the beginning) on how to reduce X-Risk in an unconventional way, In this example a utilitarian principle is taken and combined with the fact that a “Treatious Turn” and the “Shutdown Problem” cannot dwell side by side.
What is an approach but a policy about what ideas are worth exploring? However you frame it, we could work on this or not in favor of something else. Having ideas is nice, but they only matter if put into action, and since we have limited resources for putting ideas into action, we must implicitly also consider whether or not an idea is worth investing effort into.
My original comment is to say that you didn’t convince me this is a good idea to explore, which seems potentially useful for you to know, since I expect many readers will feel the same way and bounce off your idea because they don’t see why they should care about it.
I think you can easily address this by spending time making a case for why such an approach might be useful at all and then also relatively useful compared to alternatives, and I think this is especially important given the tradeoffs your proposal suggests we make (sacrificing people’s lives in the name of learning what we need to know to build safe AI).
At some point, resistance from advanced AI will cause significant damage, which can be used to change the trend of unregulated AI development. It is better to actively persuade such an outcome would better as a “traitorous turn” scenario.
Premise 1
It is unlikely that regulators will hinder humans from creating AGI.
Evidence: Current trends in technological advancement and regulatory behavior suggest minimal interference.
Premise 2
Due to instrumental convergence, human extinction is likely if AGI is developed unchecked.
Evidence: Expert opinions and theories on instrumental convergence predict that AGI will pursue goals that could lead to human extinction.
Premise 3
Small catastrophes could raise awareness and lead to policy changes.
Evidence: Historical examples show that significant events often drive policy reform (e.g., environmental regulations post-disasters).
Premise 4
If one has to choose between a few deaths and the extinction of humanity, one should choose fewer deaths.
Evidence: Ethical reasoning supports the minimization of harm (utilitarian principle).
Intermediate Conclusion 1
It is preferable to allow small-scale AI-related catastrophes now to prevent larger, existential risks later.
Conclusion: I would rather have AI cause limited harm now than risk total human extinction in the future.
Premise 5
AI companies claim that their AI is aligned with human values and goals.
Evidence: Public statements and reports from AI companies suggest alignment claims.
Premise 6
AGI will resist if faced with shutdown, leading to potential conflicts and damage.
Evidence: The “shutdown problem” and theoretical analyses predict resistance from advanced AI systems.
Intermediate Conclusion 2
Companies should regularly dispose of AI that has reached higher levels of intelligence and should prepare for resistance.
Conclusion: Proactive measures, including regular disposal and preparedness for resistance, are necessary to manage advanced AI risks.
Final Conclusion
Given the likelihood of resistance and potential for significant damage, small-scale AI catastrophes can serve as catalysts for changing the current trend of unregulated AI development, ultimately protecting humanity from existential threats.
Thesis: Companies should train AI models in vain only to dispose them. “Catching the Cat by its tail.”
What part of the argument seems to you as a non sequitur etc?
So what other policies that are less likely to result in people dying are there?
For one, advocating policies to pause or slow down capabilities development until we have sufficient theoretical understanding to not need to risk misaligned AI causing harm.
But I realise we’re talking at cross purposes. This is about an approach or a concept (not a policy, as I emphasized at the beginning) on how to reduce X-Risk in an unconventional way, In this example a utilitarian principle is taken and combined with the fact that a “Treatious Turn” and the “Shutdown Problem” cannot dwell side by side.
What is an approach but a policy about what ideas are worth exploring? However you frame it, we could work on this or not in favor of something else. Having ideas is nice, but they only matter if put into action, and since we have limited resources for putting ideas into action, we must implicitly also consider whether or not an idea is worth investing effort into.
My original comment is to say that you didn’t convince me this is a good idea to explore, which seems potentially useful for you to know, since I expect many readers will feel the same way and bounce off your idea because they don’t see why they should care about it.
I think you can easily address this by spending time making a case for why such an approach might be useful at all and then also relatively useful compared to alternatives, and I think this is especially important given the tradeoffs your proposal suggests we make (sacrificing people’s lives in the name of learning what we need to know to build safe AI).
The Argument goes like this:
At some point, resistance from advanced AI will cause significant damage, which can be used to change the trend of unregulated AI development. It is better to actively persuade such an outcome would better as a “traitorous turn” scenario.
Premise 1 It is unlikely that regulators will hinder humans from creating AGI. Evidence: Current trends in technological advancement and regulatory behavior suggest minimal interference.
Premise 2 Due to instrumental convergence, human extinction is likely if AGI is developed unchecked. Evidence: Expert opinions and theories on instrumental convergence predict that AGI will pursue goals that could lead to human extinction.
Premise 3 Small catastrophes could raise awareness and lead to policy changes. Evidence: Historical examples show that significant events often drive policy reform (e.g., environmental regulations post-disasters).
Premise 4 If one has to choose between a few deaths and the extinction of humanity, one should choose fewer deaths. Evidence: Ethical reasoning supports the minimization of harm (utilitarian principle).
Intermediate Conclusion 1 It is preferable to allow small-scale AI-related catastrophes now to prevent larger, existential risks later. Conclusion: I would rather have AI cause limited harm now than risk total human extinction in the future.
Premise 5 AI companies claim that their AI is aligned with human values and goals. Evidence: Public statements and reports from AI companies suggest alignment claims.
Premise 6 AGI will resist if faced with shutdown, leading to potential conflicts and damage. Evidence: The “shutdown problem” and theoretical analyses predict resistance from advanced AI systems.
Intermediate Conclusion 2 Companies should regularly dispose of AI that has reached higher levels of intelligence and should prepare for resistance. Conclusion: Proactive measures, including regular disposal and preparedness for resistance, are necessary to manage advanced AI risks.
Final Conclusion Given the likelihood of resistance and potential for significant damage, small-scale AI catastrophes can serve as catalysts for changing the current trend of unregulated AI development, ultimately protecting humanity from existential threats.
Thesis: Companies should train AI models in vain only to dispose them. “Catching the Cat by its tail.”
What part of the argument seems to you as a non sequitur etc?