How does the researcher know that it’s about to pass roughly human level, when a misaligned AI may have incentive and ability to fake a plateau at about human level until it has enough training and internal bootstrapping to become strongly superhuman? Even animals have the capability to fake being weaker, injured, or dead when it might benefit them.
I don’t think this is necessarily what will happen, but it is a scenario that needs to be considered.
How does the researcher know that it’s about to pass roughly human level, when a misaligned AI may have incentive and ability to fake a plateau at about human level until it has enough training and internal bootstrapping to become strongly superhuman? Even animals have the capability to fake being weaker, injured, or dead when it might benefit them.
I don’t think this is necessarily what will happen, but it is a scenario that needs to be considered.