Yeah, this is a good point, especially with our title. I’ll endeavor to add it today.
“Without specific countermeasures” definitely did inspire our title. It seems good to be clear about how our pieces differ. I think the two pieces are very different, two of the main differences are:
Our piece is much more focused on “inner alignment” difficulties, while the “playing the training game” seems more focused on “outer alignment” (although “Without specific countermeasures” does discuss some inner alignment things, this isn’t the main focus)
Our piece argues that even with specific countermeasures (i.e. AI control) behavioral training of powerful AIs is likely to lead to extremely bad outcomes. And so fundamental advances are needed (likely moving away from purely behavioral training)
Nit having not read your full post: Should you have “Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover” in the related work? My mind pattern-matched to that exact piece from reading your very similar title, so my first thought was how your piece contributes new arguments.
Yeah, this is a good point, especially with our title. I’ll endeavor to add it today.
“Without specific countermeasures” definitely did inspire our title. It seems good to be clear about how our pieces differ. I think the two pieces are very different, two of the main differences are:
Our piece is much more focused on “inner alignment” difficulties, while the “playing the training game” seems more focused on “outer alignment” (although “Without specific countermeasures” does discuss some inner alignment things, this isn’t the main focus)
Our piece argues that even with specific countermeasures (i.e. AI control) behavioral training of powerful AIs is likely to lead to extremely bad outcomes. And so fundamental advances are needed (likely moving away from purely behavioral training)