Quentin’s claims seem to rely on something like “common sense humanism” but I don’t see a process connected to the discussion that will reliably cause common sense humanism to be the only possible outcome.
Metaphorically: There is a difference between someone explaining how easy it is to ride a bike vs someone explaining how much it costs to mine and refine metal with adequate tensile strength for a bicycle seatpost that will make it safe for overweight men to also ride the bike, not just kids.
A lot of the nuanced and detailed claims in Quentin’s post might be true, but he did NOT explain (1) how he was going get funding to make a “shard-aligned AGI” on a reasonable time frame, or (2) how he would execute adequately if he did get funding and definitely not make an error and let something out of the lab that isn’t good for the world, and (3) also would go fast enough that no other lab would make the errors he thinks he would not make before he gets results that could “make the errors of other labs irrelevant to the future of the world”.
I grant that I didn’t read very thoroughly. Did you see a funding component and treaty system in his plan that I missed?
I don’t think Quintin’s claims are of the kind where he needs to propose a funding component / treaty system.
They’re of the kind where he thinks the representations ML systems learn, and the shards of behavior they acquire, make it just not super inevitable for malign optimizers to come into existence, given that the humans training models don’t want to produce malign optimizers. Or, i.e., Bensinger’s intuition about a gravitational well of optimization out there indifferent to human values is just plain wrong, at least as applied to actual minds we are likely to make.
Quintin could be wrong—I think he’s right, and his theory retrodicts the behavior of systems we have, while Bensinger et al. make no specific retrodictions as far as I know, apart from generic retrodictions standard ML theory also makes—but not including a funding component and treaty system isn’t an argument against it, because the theory is about how small in probable-future mindspace malign optimizers are, not about a super-careful way of avoiding malign optimizers that loom large in future-probable mindspace.
Quentin’s claims seem to rely on something like “common sense humanism” but I don’t see a process connected to the discussion that will reliably cause common sense humanism to be the only possible outcome.
Metaphorically: There is a difference between someone explaining how easy it is to ride a bike vs someone explaining how much it costs to mine and refine metal with adequate tensile strength for a bicycle seatpost that will make it safe for overweight men to also ride the bike, not just kids.
A lot of the nuanced and detailed claims in Quentin’s post might be true, but he did NOT explain (1) how he was going get funding to make a “shard-aligned AGI” on a reasonable time frame, or (2) how he would execute adequately if he did get funding and definitely not make an error and let something out of the lab that isn’t good for the world, and (3) also would go fast enough that no other lab would make the errors he thinks he would not make before he gets results that could “make the errors of other labs irrelevant to the future of the world”.
I grant that I didn’t read very thoroughly. Did you see a funding component and treaty system in his plan that I missed?
I don’t think Quintin’s claims are of the kind where he needs to propose a funding component / treaty system.
They’re of the kind where he thinks the representations ML systems learn, and the shards of behavior they acquire, make it just not super inevitable for malign optimizers to come into existence, given that the humans training models don’t want to produce malign optimizers. Or, i.e., Bensinger’s intuition about a gravitational well of optimization out there indifferent to human values is just plain wrong, at least as applied to actual minds we are likely to make.
Quintin could be wrong—I think he’s right, and his theory retrodicts the behavior of systems we have, while Bensinger et al. make no specific retrodictions as far as I know, apart from generic retrodictions standard ML theory also makes—but not including a funding component and treaty system isn’t an argument against it, because the theory is about how small in probable-future mindspace malign optimizers are, not about a super-careful way of avoiding malign optimizers that loom large in future-probable mindspace.