No Clickbait—Misalignment Database
This is a database of cases of Misalignment—classified by Type of Misalignment, Type of AI, etc.
Link to add more:
https://docs.google.com/forms/d/e/1FAIpQLSfE7ZeSV6W_YmKYrgy7BaiFKj90dBJ2qDUaYXzbpi_ILEs9sQ/viewform?usp=sf_link
Link to the DB: https://docs.google.com/spreadsheets/d/1uXzWavy1mS0X-uQ21UPWHlAHjXFJoWWlN62EyKAoUmA/edit?usp=sharing
Made it last week.
Currently there are 115 entries − 62 of which are from the Specification Gaming db made by DeepMind https://deepmindsafetyresearch.medium.com/specification-gaming-the-flip-side-of-ai-ingenuity-c85bdb0deeb4
For some reason, as far as I know, this is the first public database like this.
The closest that I know of are the Specification Gaming database and the https://incidentdatabase.ai/
For a community that’s supposed to be science fans, I’m pretty baffled at the lack of something as basic as this existing- among many, many other things.
If you know of any cases, please add them.
Edits:
Added link to the DB: https://docs.google.com/spreadsheets/d/1uXzWavy1mS0X-uQ21UPWHlAHjXFJoWWlN62EyKAoUmA/edit?usp=sharing
Made more clear what’s DB, what’s form.
It might be worth (someone) writing out what is meant by each kind of misalignment category, as used in the db. Objective misalignment, specific gaming, value misalignment all seem overlapping, and I’m not at all sure what physical misalignment is supposed to be pointing to.
for sure. right now it’s just a google form and google sheets. would you be interested in taking charge of this?
No, this is not something I can undertake—however, the effort itself need not be very complicated. You’ve already got a list of Misalignment types in the form: create a google doc with definitions/descriptions of each of these, and put a link to that doc in this question.
There is only link to add database entry, without link to view database itself.
Ah, sorry, here’s the link! https://docs.google.com/spreadsheets/d/1uXzWavy1mS0X-uQ21UPWHlAHjXFJoWWlN62EyKAoUmA/edit?usp=sharing
Thank you for pointing that out, also added it to the post!
I think you copy patsed the wrong link—the first link leads to a form one can use to add an example, not to the list of examples.
Thank you, I’ve labelled that as the form link now and added the DB link.
Updated to 115.
There’s also the goal misgeneralization database by DeepMind, in parallel to the misspecification one: blogpost, database.
Thank you! I’ll add those as well!