1a3orn comments on The Friendly Drunk Fool Alignment Strategy

1a3orn 16 Apr 2023 14:46 UTC
2 points
0
I don’t think Quintin’s claims are of the kind where he needs to propose a funding component / treaty system.

They’re of the kind where he thinks the representations ML systems learn, and the shards of behavior they acquire, make it just not super inevitable for malign optimizers to come into existence, given that the humans training models don’t want to produce malign optimizers. Or, i.e., Bensinger’s intuition about a gravitational well of optimization out there indifferent to human values is just plain wrong, at least as applied to actual minds we are likely to make.

Quintin could be wrong—I think he’s right, and his theory retrodicts the behavior of systems we have, while Bensinger et al. make no specific retrodictions as far as I know, apart from generic retrodictions standard ML theory also makes—but not including a funding component and treaty system isn’t an argument against it, because the theory is about how small in probable-future mindspace malign optimizers are, not about a super-careful way of avoiding malign optimizers that loom large in future-probable mindspace.