I mostly second Beren’s reservations, but given that current models can already improved sorting algorithms in ways that didn’t occur to humans (ref), I think it’s plausible that they prove useful in generating algorithms for automating interpretability and the like. E.g., some elaboration on ACDC, or ROME, or MEMIT.
I mostly second Beren’s reservations, but given that current models can already improved sorting algorithms in ways that didn’t occur to humans (ref), I think it’s plausible that they prove useful in generating algorithms for automating interpretability and the like. E.g., some elaboration on ACDC, or ROME, or MEMIT.
Note that this proposal is not about automating interpretability.