Gerald Monroe comments on Dreams of AI alignment: The danger of suggestive names

Gerald Monroe 13 Feb 2024 21:08 UTC
2 points
0
Ok so failure stops being the AI models coordinating a mass betrayal but goodharting metrics to the point that nothing works right. Not fundamentally different from a command economy failing where the punishment for missing quotas is gulag, and the punishment for lying on a report is gulag but later, so...

There’s also nothing new about the failures, the USA incarceration rate is an example of what “trying too hard” looks like.