Thomas Kwa comments on Vote on Interesting Disagreements

Thomas Kwa 10 Nov 2023 11:41 UTC
5 points
0
Most end-to-end “alignment plans” are bad because research will be incremental. For example, Superalignment’s impact will mostly come from adapting to the next ~3 years of AI discoveries and working on relevant subproblems like interp, rather than creating a superhuman alignment researcher.