Despite not answering all possible goal-related questions a priori, the reductionist perspective does provide a tractable research program for improving our understanding of AI goal development. It does this by reducing questions about goals to questions about behaviors observable in the training data.
[emphasis mine]
This might be described as “areductionist perspective”. It is certainly not “the reductionist perspective”, since reductionist perspectives need not limit themselves to “behaviors observable in the training data”.
A more reasonable-to-my-mind behavioral reductionist perspective might look like this.
Ruling out goal realism as a good way to think does not leave us with [the particular type of reductionist perspective you’re highlighting]. In practice, I think the reductionist perspective you point at is:
Useful, insofar as it answers some significant questions.
Highly misleading if we ever forget that [this perspective doesn’t show us that x is a problem] doesn’t tell us [x is not a problem].
[emphasis mine]
This might be described as “a reductionist perspective”. It is certainly not “the reductionist perspective”, since reductionist perspectives need not limit themselves to “behaviors observable in the training data”.
A more reasonable-to-my-mind behavioral reductionist perspective might look like this.
Ruling out goal realism as a good way to think does not leave us with [the particular type of reductionist perspective you’re highlighting].
In practice, I think the reductionist perspective you point at is:
Useful, insofar as it answers some significant questions.
Highly misleading if we ever forget that [this perspective doesn’t show us that x is a problem] doesn’t tell us [x is not a problem].