The reason I think this dynamic exists for the Machines of Loving Grace posts is a combination of 2 reasons:
It’s intentionally not talking about misalignment, and assumes as a premise that the AI we do get is aligned by some method that is low tax enough that basically everyone else also adopts the solution.
You can’t get a lot of nuance/future shock in public facing posts, for the reasons laid out by Raemon here, which summarized is that even in a context where people aren’t adversarial and are just unreliable, it’s very hard to communicate nuanced ideas, and when there are adversarial forces, you really need to avoid giving out too much nuance to your policy, because people will exploit that.
The reason I think this dynamic exists for the Machines of Loving Grace posts is a combination of 2 reasons:
It’s intentionally not talking about misalignment, and assumes as a premise that the AI we do get is aligned by some method that is low tax enough that basically everyone else also adopts the solution.
You can’t get a lot of nuance/future shock in public facing posts, for the reasons laid out by Raemon here, which summarized is that even in a context where people aren’t adversarial and are just unreliable, it’s very hard to communicate nuanced ideas, and when there are adversarial forces, you really need to avoid giving out too much nuance to your policy, because people will exploit that.
See here for full story:
https://www.lesswrong.com/posts/4ZvJab25tDebB8FGE/you-get-about-five-words#tREaGcLsrtdz3WHnd
The dynamic I want explaining is why it persists over the entire written publications by Anthropic, not this one post.