As you said, AI safety lacks good feedback loops, compared to capabilities feedback loops. Thus 3 scenarios occur: Either AI safety doesn’t matter at all (We can’t build AGI or it’s easy to align by default), we are doomed because feedback loops can’t be done in AI Alignment/Safety, or we by default succeed. It’s similar to John Wentworth’s post: When iterative design fails, linked here:
Now, John Wentworth’s stories about gunpowder and the medieval lord is overstating things, but if we look at modern weapons vs medieval lords, it’s usually a win for the modern soldiers unless severe skewing of numbers occur (more like 1:100 or more) or the modern force has too small a frontage.
Another implication is that I understand why academia/meta work is stereotyped as being out of touch with reality by populists, even if I suspect that this is actually at least somewhat wrong.
Some important implications of this:
As you said, AI safety lacks good feedback loops, compared to capabilities feedback loops. Thus 3 scenarios occur: Either AI safety doesn’t matter at all (We can’t build AGI or it’s easy to align by default), we are doomed because feedback loops can’t be done in AI Alignment/Safety, or we by default succeed. It’s similar to John Wentworth’s post: When iterative design fails, linked here:
https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-fails
Now, John Wentworth’s stories about gunpowder and the medieval lord is overstating things, but if we look at modern weapons vs medieval lords, it’s usually a win for the modern soldiers unless severe skewing of numbers occur (more like 1:100 or more) or the modern force has too small a frontage.
Another implication is that I understand why academia/meta work is stereotyped as being out of touch with reality by populists, even if I suspect that this is actually at least somewhat wrong.