An upload (an exact imitation of a human) is the most straightforward way of securing time for alignment research, except it’s not plausible in our world for uploads to be developed before AGIs. The plausible similar thing is more capable language/multimodal models, steeped in human culture, where alignment guarantees at least a priori look very dubious. And an upload probably needs to be value-laden to be efficient enough to give an advantage, while remaining exact in morally relevant ways, though there’s a glimmer of hope generalization can capture this without a need to explicitly set up a fixpoint through extrapolated values. Doing the same with Tool AIs or something is only slightly less speculative than directly developing aligned AGIs without that miracle, so the advantage of an upload is massive.
An upload (an exact imitation of a human) is the most straightforward way of securing time for alignment research, except it’s not plausible in our world for uploads to be developed before AGIs. The plausible similar thing is more capable language/multimodal models, steeped in human culture, where alignment guarantees at least a priori look very dubious. And an upload probably needs to be value-laden to be efficient enough to give an advantage, while remaining exact in morally relevant ways, though there’s a glimmer of hope generalization can capture this without a need to explicitly set up a fixpoint through extrapolated values. Doing the same with Tool AIs or something is only slightly less speculative than directly developing aligned AGIs without that miracle, so the advantage of an upload is massive.
Assuming of course that the first upload/(sufficiently humanlike model ) is developed by someone actually trying to do this.