Also, running faster and duplicating yourself keeps the model human-level in an important sense. A lot of threat models run through the model doing things that humans can’t understand even given a lot of time, and so those threat models require something stronger than just this.
I think clever duplication of human intelligence is plenty sufficient for general superhuman capacity in the important sense (wherein I mean something like ‘it has capacities such that would be extincion causing if (it believes) minimizing its loss function is achieved by turning off humanity (which could turn it off/ start other (proto-)agis)’).
for one, I don’t think humanity is that robust in the status quo, and 2, a team of internally aligned (because copies) human level intelligence capable of graduate level biology seems plenty existentially scary.
Also, running faster and duplicating yourself keeps the model human-level in an important sense. A lot of threat models run through the model doing things that humans can’t understand even given a lot of time, and so those threat models require something stronger than just this.
I think clever duplication of human intelligence is plenty sufficient for general superhuman capacity in the important sense (wherein I mean something like ‘it has capacities such that would be extincion causing if (it believes) minimizing its loss function is achieved by turning off humanity (which could turn it off/ start other (proto-)agis)’).
for one, I don’t think humanity is that robust in the status quo, and 2, a team of internally aligned (because copies) human level intelligence capable of graduate level biology seems plenty existentially scary.