abramdemski comments on Confucianism in AI Alignment

abramdemski 23 Nov 2020 16:49 UTC
LW: 4 AF: 4
AF
I think one issue which this post sort of dances around, and which maybe a lot of discussion of inner optimizers leaves implicit or unaddressed, is the difference between having a loss function which you can directly evaluate vs one which you must estimate via some sort of sample.
The argument in this post about how inner optimizers misbehaving is necessarily behavioral, and therefore best addressed by behavioral loss functions, misses the point that these misbehaviors are on examples we don’t check. As such, it comes off as:
- Perhaps arguing that we should check every example, or check much more thoroughly.
- Perhaps arguing that the examples should be made more representative.
Now, I personally think that “distributional shift” is a misleading framing, because in learning in general (EG Solomonoff induction) we don’t have an IID distribution (unlike in EG classification tasks), so we don’t have a “distribution” to “shift”.
But to the extent that we can talk in this framing, I’m kinda like… what are you saying here? Are you really proposing that we should just check instances more thoroughly or something like that?