When you say “create an AGI which doesn’t do this” do you mean that it has about 0% probability of doing it or one that have less than 100% probability of doing it?
Edit: my impression was that the point of alignment was producing an AGI that have high probability of good outcomes and low probability of bad outcomes. Creating an AGI that simply have low probability of destroying the universe seems to be trivial. Take a hypothetical AGI before it produced output, throw a coin and if its tails then destroy it. Voila, the probability of destroying the universe is now at most 50%. How can you even have device that is guaranteed to destroy universe if on early stages it can be stopped by sufficiently paranoid developer or a solar flare?
I don’t see how your scenario addresses the statement “Taking over the lightcone is the default behavior”. Yes, it’s obvious that you can build an AGI and then destroy it before you turn it on. You can also choose to just not build one at all with no coin flip. There’s also the objection that if you destroy it before you turn it on, have you really created an AGI, or just something that potentially might have been an AGI?
It also doesn’t stop other people from building one. If theirs destroys all human value in the future lightcone by default, then you still have just as big a problem.
When you say “create an AGI which doesn’t do this” do you mean that it has about 0% probability of doing it or one that have less than 100% probability of doing it?
Edit: my impression was that the point of alignment was producing an AGI that have high probability of good outcomes and low probability of bad outcomes. Creating an AGI that simply have low probability of destroying the universe seems to be trivial. Take a hypothetical AGI before it produced output, throw a coin and if its tails then destroy it. Voila, the probability of destroying the universe is now at most 50%. How can you even have device that is guaranteed to destroy universe if on early stages it can be stopped by sufficiently paranoid developer or a solar flare?
I don’t see how your scenario addresses the statement “Taking over the lightcone is the default behavior”. Yes, it’s obvious that you can build an AGI and then destroy it before you turn it on. You can also choose to just not build one at all with no coin flip. There’s also the objection that if you destroy it before you turn it on, have you really created an AGI, or just something that potentially might have been an AGI?
It also doesn’t stop other people from building one. If theirs destroys all human value in the future lightcone by default, then you still have just as big a problem.
I don’t see why all possible ways for AGI to critically fail to do what we have build it for must involve taking over the lightcone.
So let’s also blow up the Earth. By that definition the alignment would be solved.