My prediction is that GPT is already capable of the former, which means we might have solved a tough problem in alignment almost by accident!
I think this is incorrect. I don’t consider whether an LM can tell whether most humans would approve of an outcome described in natural language to be a tough problem in alignment. This is a far easier thing to do than the thing #1 describes.
“World can be described and analyzed using the natural human language well enough to do accurate reasoning and prediction” could be another measure of the “good” world, imho.
If the natural language can’t be used to reason about the world anymore, it’s likely that this world is already alien enough to people to have no human value.
I think this is incorrect. I don’t consider whether an LM can tell whether most humans would approve of an outcome described in natural language to be a tough problem in alignment. This is a far easier thing to do than the thing #1 describes.
Some argument for this position: https://www.lesswrong.com/posts/ktJ9rCsotdqEoBtof/asot-some-thoughts-on-human-abstractions
“World can be described and analyzed using the natural human language well enough to do accurate reasoning and prediction” could be another measure of the “good” world, imho.
If the natural language can’t be used to reason about the world anymore, it’s likely that this world is already alien enough to people to have no human value.