Even if you do discover a perfect alignment solution by testing it’s “general universal architecture” in the simbox, you need to implement it correctly on the first critical try.
If you discover this, you are mostly done. By definition, a perfect alignment solution is the best that could exist, so there is nothing more to do (in terms of alignment design at least). The ‘implementation’ is then mostly just running the exact same code in the real world rather than in a sim. Of course in practice there are other implementation related issues—like how you decide what to align the AGI towards, but those are out of scope here.
The “AI boxing” thought experiment is a joke—it assumes a priori that the AI is not contained in a simbox. If the AI is not aware that it is in a simulation, then it can not escape, anymore than you could escape this simulation. The fact that you linked that old post indicates again that you are simply parroting standard old LW groupthink, and didn’t actually read the article, as it has a section on containment.
The EY/MIRI/LW groupthink is now mostly outdated/discredited: based on an old mostly incorrect theory of the brain, that has not aged well in the era of AGI through DL.
If you discover this, you are mostly done. By definition, a perfect alignment solution is the best that could exist, so there is nothing more to do (in terms of alignment design at least). The ‘implementation’ is then mostly just running the exact same code in the real world rather than in a sim. Of course in practice there are other implementation related issues—like how you decide what to align the AGI towards, but those are out of scope here.
The “AI boxing” thought experiment is a joke—it assumes a priori that the AI is not contained in a simbox. If the AI is not aware that it is in a simulation, then it can not escape, anymore than you could escape this simulation. The fact that you linked that old post indicates again that you are simply parroting standard old LW groupthink, and didn’t actually read the article, as it has a section on containment.
The EY/MIRI/LW groupthink is now mostly outdated/discredited: based on an old mostly incorrect theory of the brain, that has not aged well in the era of AGI through DL.