It sure looks like we are going to have inexact imitations of humans that are able to do useful work, to continue to broadly agree with humans about what you “ought to do” in a way that is common-sensically smart (such that to the extent you get useful work from them it’s still “good” in the same way as a human’s behavior). It also looks like those properties are likely to be retained when a bunch of them are collaborating in a “Chinese Room Bureaucracy,” though this is not clear.
I want to note there’s a pretty big difference between “what you say you ought to do” and “what you do”; I basically expect language models to imitate humans as well as possible, which will include lots of homo hypocritus things like saying it’s wrong to lie and also lying, and to the extent that it tries to capture “all things humans might say” it will be representing all sides of all cultural / moral battles, which seems like it misses on a bunch of consistency and coherency properties that humans say they ought to have.
I feel like you’ve got to admit that we’re currently in a world where everyone is building non-self-modifying Oracles that can explain the consequences of their plans
This feels like the scale/regime complaint to me? Yes, people have built a robot that can describe the intended consequences of moving its hand around an enclosure (“I will put the red block on top of the blue cylinder”), or explain the steps of solving simple problems (“Answer this multiple choice reading comprehension question, and explain your answer”), but once we get to the point where you need nontrivial filtering (“Tell us what tax policy we should implement, and explain the costs and benefits of its various features”) then it seems like the sort of thing where most of the thoughts would be opaque or not easily captured in sentences.
I want to note there’s a pretty big difference between “what you say you ought to do” and “what you do”; I basically expect language models to imitate humans as well as possible, which will include lots of homo hypocritus things like saying it’s wrong to lie and also lying, and to the extent that it tries to capture “all things humans might say” it will be representing all sides of all cultural / moral battles, which seems like it misses on a bunch of consistency and coherency properties that humans say they ought to have.
This feels like the scale/regime complaint to me? Yes, people have built a robot that can describe the intended consequences of moving its hand around an enclosure (“I will put the red block on top of the blue cylinder”), or explain the steps of solving simple problems (“Answer this multiple choice reading comprehension question, and explain your answer”), but once we get to the point where you need nontrivial filtering (“Tell us what tax policy we should implement, and explain the costs and benefits of its various features”) then it seems like the sort of thing where most of the thoughts would be opaque or not easily captured in sentences.