jacquesthibs comments on jacquesthibs’s Shortform

jacquesthibs 25 Nov 2024 22:46 UTC

15 points

I have some alignment project ideas for things I’d consider mentoring for. I would love feedback on the ideas. If you are interested in collaborating on any of them, that’s cool, too.

Here are the titles:

Smart AI vs swarm of dumb AIs

Lit review of chain of thought faithfulness (steganography in AIs)

Replicating METR paper but for alignment research task

Tool-use AI for alignment research

Sakana AI for Unlearning

Automated alignment onboarding

Build the infrastructure for making Sakana AI's AI scientist better for alignment research