I think many people focus on doing research that focuses on full automation, but I think it’s worth trying to think in the semi-automated frame as well when trying to come up with a path to impact. Obviously, it isn’t scalable, but it may be more sufficient than we’d think by default for a while. In other words, cyborgism-enjoyers might be especially interested in those kinds of evals, capability measurements that are harder to pull out of the model through traditional evals, but easier to measure through some semi-automated setup.
I think many people focus on doing research that focuses on full automation, but I think it’s worth trying to think in the semi-automated frame as well when trying to come up with a path to impact. Obviously, it isn’t scalable, but it may be more sufficient than we’d think by default for a while. In other words, cyborgism-enjoyers might be especially interested in those kinds of evals, capability measurements that are harder to pull out of the model through traditional evals, but easier to measure through some semi-automated setup.