It makes sense that you don’t want this article to opine on the question of whether people should not have created “misalignment data”, but I’m glad you concluded that it wasn’t a mistake in the comments. I find it hard to even tell a story where this genre of writing was a mistake. Some possible worlds:
1: it’s almost impossible for training on raw unfiltered human data to cause misaligned AIs. In this case there was negligible risk from polluting the data by talking about misaligned AIs, it was just a waste of time.
2: training on raw unfiltered human data can cause misaligned AIs. Since there is a risk of misaligned AIs, it is important to know that there’s a risk, and therefore to not train on raw unfiltered human data. We can’t do that without talking about misaligned AIs. So there’s a benefit from talking about misaligned AIs.
3: training on raw unfiltered human data is very safe, except that training on any misalignment data is very unsafe. The safest thing is to train on raw unfiltered human data that naturally contains no misalignment data.
Only world 3 implies that people should not have produced the text in the first place. And even there, once “2001: A Space Odyssey” (for example) is published the option to have no misalignment data in the corpus is blocked, and we’re in world 2.
Here is a related market inspired by the AI timelines dialog, currently at 30%:
Note that in this market the AI is not restricted to only “pretraining-scaling plus transfer learning from RL on math/programming”, it is allowed to be trained on a wide range of video games, but it has to do transfer learning to a new genre. Also, it is allowed to transfer successfully to any new genre, not just Pokémon.
I infer you are at ~20% for your more restrictive prediction:
80% bear case is correct, in which case P=5%
20% bear case is wrong, in which case P=80% (?)
So perhaps you’d also be at ~30% for this market?
I’m not especially convinced by your bear case, but I think I’m also at ~30% on the market. I’m tempted to bet lower because of the logistics of training the AI, finding a genre that it wasn’t trained on (might require a new genre to be created), and then having the demonstration occur, all in the next nine months. But I’m not sure I have an edge over the other bettors on this one.