If one isn’t concerned about the AGI’s ability to either (a) be able out-of-the-box to either successfully subvert the testing mechanisms being applied to it or successfully neutralize whatever mechanisms are in place to deal with it if it “fails” those tests, or (b) self-improve rapidly enough to achieve that state before those testing or dealing-with-failures mechanisms apply, then sure, a sufficiently well-designed test harness around some plausible-but-not-guaranteed algorithms will work fine, as it does for most software.
Of course, if one is concerned about an AGI’s ability to do either of those things, one may not wish to rely on such a test harness.
It seems to follow from this that some kind of quantification of what kinds of algorithms can do either of those things, and whether there’s any way to reliably determine whether a particular algorithm falls into that set prior to implementing it, might allow AGI developers to do trial-and-error work on algorithms that provably don’t meet that standard would be one way of making measurable progress without arousing the fears of those who consider FOOMing algorithms a plausible existential risk.
Of course, that doesn’t have the “we get it right and then everything is suddenly better” aspect of successfully building a FOOMing FAI… it’s just research and development work, the same sort of incremental collective process that has resulted in, well, pretty much all human progress to date.
If one isn’t concerned about the AGI’s ability to either (a) be able out-of-the-box to either successfully subvert the testing mechanisms being applied to it or successfully neutralize whatever mechanisms are in place to deal with it if it “fails” those tests, or (b) self-improve rapidly enough to achieve that state before those testing or dealing-with-failures mechanisms apply, then sure, a sufficiently well-designed test harness around some plausible-but-not-guaranteed algorithms will work fine, as it does for most software.
Of course, if one is concerned about an AGI’s ability to do either of those things, one may not wish to rely on such a test harness.
It seems to follow from this that some kind of quantification of what kinds of algorithms can do either of those things, and whether there’s any way to reliably determine whether a particular algorithm falls into that set prior to implementing it, might allow AGI developers to do trial-and-error work on algorithms that provably don’t meet that standard would be one way of making measurable progress without arousing the fears of those who consider FOOMing algorithms a plausible existential risk.
Of course, that doesn’t have the “we get it right and then everything is suddenly better” aspect of successfully building a FOOMing FAI… it’s just research and development work, the same sort of incremental collective process that has resulted in, well, pretty much all human progress to date.