I don’t think one needs to believe the human worth hypothesis to disbelieve strong orthogonality, one only needs to believe that gradient descent is able to actually find representations that correctly represent the important parts of the things the training data was intended by the algorithm designer to represent, eg for the youtube recommender this would be “does this enrich the user’s life enough to keep them coming back”, but what’s actually measured is just “how long do they come back”.
I don’t think one needs to believe the human worth hypothesis to disbelieve strong orthogonality, one only needs to believe that gradient descent is able to actually find representations that correctly represent the important parts of the things the training data was intended by the algorithm designer to represent, eg for the youtube recommender this would be “does this enrich the user’s life enough to keep them coming back”, but what’s actually measured is just “how long do they come back”.