What % evals/demos and what % mech interp would you expect to see if there wasn’t Goodharting? 1⁄3 and 1⁄5 doesn’t seem that high to me, given the value of these agendas and the advantages of touching reality that Ryan named.
Hard to be confident here, but maybe half those numbers or even less (especially for evals/demos)?
If you could choose the perfect portfolio allocation, does it seem reasonable to you that > 1⁄2 (assuming no overlap) should go to evals/demos and mech interp?
What % evals/demos and what % mech interp would you expect to see if there wasn’t Goodharting? 1⁄3 and 1⁄5 doesn’t seem that high to me, given the value of these agendas and the advantages of touching reality that Ryan named.
Hard to be confident here, but maybe half those numbers or even less (especially for evals/demos)?
If you could choose the perfect portfolio allocation, does it seem reasonable to you that > 1⁄2 (assuming no overlap) should go to evals/demos and mech interp?