Ok so failure stops being the AI models coordinating a mass betrayal but goodharting metrics to the point that nothing works right. Not fundamentally different from a command economy failing where the punishment for missing quotas is gulag, and the punishment for lying on a report is gulag but later, so...
There’s also nothing new about the failures, the USA incarceration rate is an example of what “trying too hard” looks like.
Ok so failure stops being the AI models coordinating a mass betrayal but goodharting metrics to the point that nothing works right. Not fundamentally different from a command economy failing where the punishment for missing quotas is gulag, and the punishment for lying on a report is gulag but later, so...
There’s also nothing new about the failures, the USA incarceration rate is an example of what “trying too hard” looks like.