I think it’s important to consider hacking in any safety efforts. These hacks would probably include stealing and using any safety methods for control or alignment, for the same reasons the originating org was using them—they don’t want to lose control of their AGI. Better make those techniques and their code public, and publicly advertise why you’re using them!
Of course, we’d worry that some actors (North Korea, Russia, individuals who are skilled hackers) are highly misaligned with the remainder of humanity, andd might bring about existential catastrophes through some combination of foolishness and selfishness.
The other concern is mere proliferation of aligned/controlled systems, which leads to existential danger as soon as those systems approach the capability for autonomous recursive self-improvement: If we solve alignment, do we die anyway?
I think it’s important to consider hacking in any safety efforts. These hacks would probably include stealing and using any safety methods for control or alignment, for the same reasons the originating org was using them—they don’t want to lose control of their AGI. Better make those techniques and their code public, and publicly advertise why you’re using them!
Of course, we’d worry that some actors (North Korea, Russia, individuals who are skilled hackers) are highly misaligned with the remainder of humanity, andd might bring about existential catastrophes through some combination of foolishness and selfishness.
The other concern is mere proliferation of aligned/controlled systems, which leads to existential danger as soon as those systems approach the capability for autonomous recursive self-improvement: If we solve alignment, do we die anyway?