They’re mostly doing “train a language model on a bunch of data and hope human concepts and values are naturally present in the neural net that pops out”, which isn’t exactly either of these strategies. Currently it’s a bit of a struggle to get language models to go in an at-all-nonrandom direction (though there has been recent progress in that area). There are tidbits of deconfusion-about-ethics here and there on LW, but nothing I would call a research program.
They’re mostly doing “train a language model on a bunch of data and hope human concepts and values are naturally present in the neural net that pops out”, which isn’t exactly either of these strategies. Currently it’s a bit of a struggle to get language models to go in an at-all-nonrandom direction (though there has been recent progress in that area). There are tidbits of deconfusion-about-ethics here and there on LW, but nothing I would call a research program.