Nathan Helm-Burger comments on Siebe’s Shortform

Nathan Helm-Burger 24 Jan 2025 2:30 UTC
3 points
0
I’ve been planning for a while to do a similar experiment with adding documents showing examples of AIs behaving in corrigible ways (inspired by talking with Max about Corrigibility as Singular Target)

I think examples of honest and aligned CoT resulting in successful task completion is also a good idea.
- Milan W 24 Jan 2025 14:23 UTC
  4 points
  0
  Parent
  Want to collaborate on this experiment idea you have? I have time, and can do the implementation work while you mostly instruct/mentor me.