I have some alignment project ideas for things I’d consider mentoring for. I would love feedback on the ideas. If you are interested in collaborating on any of them, that’s cool, too.
Here are the titles:
Smart AI vs swarm of dumb AIs
Lit review of chain of thought faithfulness (steganography in AIs)
Replicating METR paper but for alignment research task
Tool-use AI for alignment research
Sakana AI for Unlearning
Automated alignment onboarding
Build the infrastructure for making Sakana AI's AI scientist better for alignment research
I have some alignment project ideas for things I’d consider mentoring for. I would love feedback on the ideas. If you are interested in collaborating on any of them, that’s cool, too.
Here are the titles:
Smart AI vs swarm of dumb AIs
Lit review of chain of thought faithfulness (steganography in AIs)
Replicating METR paper but for alignment research task
Tool-use AI for alignment research
Sakana AI for Unlearning
Automated alignment onboarding
Build the infrastructure for making Sakana AI's AI scientist better for alignment research