Wei Dai comments on Matthew Barnett’s Shortform

Wei Dai 9 Mar 2024 23:26 UTC
2 points
0
I’m saying that even if “AI values are well-modeled as being randomly sampled from a large space of possible goals” is true, the AI may well not be very certain that it is true, and therefore assign something like a 5% chance to humans using similar training methods to construct an AI that shares its values. (It has an additional tiny probability that “AI values are well-modeled as being randomly sampled from a large space of possible goals” is true and an AI with similar values get recreated anyway through random chance, but that’s not what I’m focusing on.)

Hopefully this conveys my argument more clearly?