Great points! I agree re: short timelines being the crux.
I chatted to Logan Riggs today, and he argued that improvements in capabilities will make ambitious mech interp possible in time to let us develop solutions to align / monitor powerful AI. This seems very optimistic to say the least, and I remain as yet unconvinced that ‘somehow’ mech interp will buck the historical trend of having been disappointing.
Great points! I agree re: short timelines being the crux.
I chatted to Logan Riggs today, and he argued that improvements in capabilities will make ambitious mech interp possible in time to let us develop solutions to align / monitor powerful AI. This seems very optimistic to say the least, and I remain as yet unconvinced that ‘somehow’ mech interp will buck the historical trend of having been disappointing.