The black box is made of humans and might be tested the usual way when (human-designed) WBE tech is developed. The problem of designing its (long term) social organisation might also be deferred to the box. The point of the box is that it can be made safe from external catastrophic risks, not that it represents any new progress towards FAI.
The AI doesn’t produce philosophical answers, the box does, and the box doesn’t contain novel/dangerous things like AIs. This only requires solving the separate problems of having AI care about evaluating a program, and preparing a program that contains people who would solve the remaining problems (and this part doesn’t involve AI). The AI is something that can potentially be theoretically completely understood and it can be very carefully tested under controlled conditions, to see that it does evaluate simpler black boxes that we also understand. Getting decision theory wrong seems like a more elusive risk.
The black box is made of humans and might be tested the usual way when (human-designed) WBE tech is developed.
Ok, I think I misunderstood you earlier, and thought that your idea was similar to Paul Christiano’s, where the FAI would essentially develop the WBE tech instead of us. I had also suggested waiting for WBE tech before building FAI (although due to a somewhat different motivation), and in response someone (maybe Carl Shulman?) argued that brain-inspired AGI or low-fidelity brain emulations would likely be developed before high-fidelity brain emulations, which means the FAI would probably come too late if it waited for WBE. This seems fairly convincing to me.
Waiting for WBE is risky in many ways, but I don’t see a potentially realistic plan that doesn’t go through it, even if we have (somewhat) smarter humans. This path (and many variations, such as a WBE superorg just taking over “manually” and not leaving anyone else with access to physical world) I can vaguely see working, solving the security/coordination problem, if all goes right; other paths seem much more speculative to me (but many are worth trying, given resources; if somehow possible to do reliably, AI-initiated WBE when there is no human-developed WBE would be safer).
The black box is made of humans and might be tested the usual way when (human-designed) WBE tech is developed. The problem of designing its (long term) social organisation might also be deferred to the box. The point of the box is that it can be made safe from external catastrophic risks, not that it represents any new progress towards FAI.
The AI doesn’t produce philosophical answers, the box does, and the box doesn’t contain novel/dangerous things like AIs. This only requires solving the separate problems of having AI care about evaluating a program, and preparing a program that contains people who would solve the remaining problems (and this part doesn’t involve AI). The AI is something that can potentially be theoretically completely understood and it can be very carefully tested under controlled conditions, to see that it does evaluate simpler black boxes that we also understand. Getting decision theory wrong seems like a more elusive risk.
Ok, I think I misunderstood you earlier, and thought that your idea was similar to Paul Christiano’s, where the FAI would essentially develop the WBE tech instead of us. I had also suggested waiting for WBE tech before building FAI (although due to a somewhat different motivation), and in response someone (maybe Carl Shulman?) argued that brain-inspired AGI or low-fidelity brain emulations would likely be developed before high-fidelity brain emulations, which means the FAI would probably come too late if it waited for WBE. This seems fairly convincing to me.
Waiting for WBE is risky in many ways, but I don’t see a potentially realistic plan that doesn’t go through it, even if we have (somewhat) smarter humans. This path (and many variations, such as a WBE superorg just taking over “manually” and not leaving anyone else with access to physical world) I can vaguely see working, solving the security/coordination problem, if all goes right; other paths seem much more speculative to me (but many are worth trying, given resources; if somehow possible to do reliably, AI-initiated WBE when there is no human-developed WBE would be safer).