Additional note for posterity: when I talked about “some objectives [may] make alignment far more likely”, I was considering something like “given this pretraining objective and an otherwise fixed training process, what is the measure of data-sets in the N-datapoint hypercube such that the trained model is aligned?”, perhaps also weighting by ease of specification in some sense.
what is the measure of data-sets in the N-datapoint hypercube such that the trained model is aligned?”, perhaps also weighting by ease of specification in some sense.
You’re going to need the ease of specification condition, or something similar; else you’ll probably run into no-free-lunch considerations (at which point I think you’ve stopped talking about anything useful).
Sure.
Additional note for posterity: when I talked about “some objectives [may] make alignment far more likely”, I was considering something like “given this pretraining objective and an otherwise fixed training process, what is the measure of data-sets in the N-datapoint hypercube such that the trained model is aligned?”, perhaps also weighting by ease of specification in some sense.
You’re going to need the ease of specification condition, or something similar; else you’ll probably run into no-free-lunch considerations (at which point I think you’ve stopped talking about anything useful).