Thane Ruthenis comments on Principles of Privacy for Alignment Research

Thane Ruthenis 29 Jul 2022 4:27 UTC
3 points
2
the “just increase the alignment/capabilities ratio” model
If Donald is talking about the reasoning from my post, the primary argument there went a bit different. It was about expanding the AI Safety field by converting extant capabilities researchers/projects; and that even if we can’t make them stop capability research, any intervention that 1) doesn’t speed it up and 2) makes them output alignment results alongside capabilities results is net positive.
I think I’d also argued that the AI Safety is tiny at the moment so we won’t contribute much to capability research even if we deliberately tried, but in retrospect, that argument is obviously invalid in hypotheticals where we’re actually effective at solving alignment.