I mean extracting insights from capabilities research that currently exists, not changing the direction of new research. For example, specification gaming is on everyone’s radar because it was observed in capabilities research (the authors of the linked post compiled this list of specification-gaming examples, some of which are from the 1980s). I wonder how much more opportunity there might be to piggyback on existing capabilities research for alignment purposes, and maybe to systemize that going forward.
Do you mean from what already exists or from changing the direction of new research?
I mean extracting insights from capabilities research that currently exists, not changing the direction of new research. For example, specification gaming is on everyone’s radar because it was observed in capabilities research (the authors of the linked post compiled this list of specification-gaming examples, some of which are from the 1980s). I wonder how much more opportunity there might be to piggyback on existing capabilities research for alignment purposes, and maybe to systemize that going forward.