ChengCheng Mar 31, 2023, 12:46 AM
4 points
0
on: Speed running everyone through the bad alignement bingo. $5k bounty for a LW conversational agent
First of all, thank you @ArthurB for offering this bounty and raising the awareness of the need for quality AI alignment educational resources! We are particularly grateful to those who mentioned the Stampy project and also to people who have reached out offering to help in our efforts. Our submission https://chat.stampy.ai/ is a very early prototype focused primarily on summarizing and synthesizing information from our own database of FAQs along with selected documents collected from the alignment research dataset. The conversational feature still requires considerable work. Nevertheless, we would love to get input and feedback to further develop this tool for anyone seeking to better understand or contribute to AI safety. This would not have been possible without the support of our volunteers and collaborators. We welcome all who are interested in using AI to advance alignment.

ChengCheng

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google