This is a problem with both the Superalignment team and the Conjecture’s approach. If you need one human-level researcher so much, you can simply hire someone!
If you run significantly more than one human-level researcher at 1x human speed, you need to explain why the whole system is aligned. “It’s made out of parts that don’t really want to kill you or wouldn’t succeed at killing you” tells nothing about the goals the whole system might start pursuing
One human-level system can maybe find an important zero-day vulnerability once every couple of months. If there are thousands of these systems, and they work much faster, they can read GitHub, find thousands of zero-day vulnerabilities, and hack literally everything. If one system really wanted to do something, it probably simply wouldn’t be able to, especially if the humans are looking.
If a giant system you don’t fully manually oversee, don’t understand, and don’t control, starts wanting something, it can get what it wants, and there’s no reason why the whole thing will be optimising for anything in the direction of what small parts of it would’ve been fuzzily optimising for, if left to their own devices
This is a problem with both the Superalignment team and the Conjecture’s approach. If you need one human-level researcher so much, you can simply hire someone!
If you run significantly more than one human-level researcher at 1x human speed, you need to explain why the whole system is aligned. “It’s made out of parts that don’t really want to kill you or wouldn’t succeed at killing you” tells nothing about the goals the whole system might start pursuing
One human-level system can maybe find an important zero-day vulnerability once every couple of months. If there are thousands of these systems, and they work much faster, they can read GitHub, find thousands of zero-day vulnerabilities, and hack literally everything. If one system really wanted to do something, it probably simply wouldn’t be able to, especially if the humans are looking.
If a giant system you don’t fully manually oversee, don’t understand, and don’t control, starts wanting something, it can get what it wants, and there’s no reason why the whole thing will be optimising for anything in the direction of what small parts of it would’ve been fuzzily optimising for, if left to their own devices
Why would this cluster of human level minds start cooperating with each other?