An AI with a good world model will predictably have a model of your values, but that’s different from being able to actually elicit that model via e.g. a series of labeled examples. That’s the part that seemed less plausible before DL.
An AI with a good world model will predictably have a model of your values, but that’s different from being able to actually elicit that model via e.g. a series of labeled examples. That’s the part that seemed less plausible before DL.