Daniel Tan comments on Why I’m Moving from Mechanistic to Prosaic Interpretability

Daniel Tan 30 Dec 2024 17:50 UTC
7 points
0
Great points! I agree re: short timelines being the crux.
I chatted to Logan Riggs today, and he argued that improvements in capabilities will make ambitious mech interp possible in time to let us develop solutions to align / monitor powerful AI. This seems very optimistic to say the least, and I remain as yet unconvinced that ‘somehow’ mech interp will buck the historical trend of having been disappointing.