Daniel Paleka comments on Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research

Daniel Paleka 23 Mar 2023 13:39 UTC
11 points
17
Equipping LLMs with agency and intrinsic motivation is a fascinating and important direction for future work.
Saying the quiet part out loud, I see!
It is followed by this sentence, though, which is the only place in the 154-page paper that even remotely hints at critical risks:
With this direction of work, great care would have to be taken on alignment and safety per a system’s abilities to take autonomous actions in the world and to perform autonomous self-improvement via cycles of learning.
Very scarce references to any safety works, except the GPT-4 report and a passing mention to some interpretability papers.
Overall, I feel like the paper is a shameful exercise in not mentioning the elephant in the room. My guess is that their corporate bosses are censoring mentions of risks that could get them bad media PR, like with the Sydney debacle. It’s still not a good excuse.
- Kaj_Sotala 23 Mar 2023 16:40 UTC
  25 points
  16
  Parent
  My guess is that their corporate bosses are censoring mentions of risks that could get them bad media PR, like with the Sydney debacle.
  I think an equally if not more likely explanation is that these particular researchers simply don’t happen to be that interested in alignment questions, and thought “oh yeah we should probably put in a token mention of alignment and some random citations to it” when writing the paper.
  - Matt Vogel 24 Mar 2023 15:11 UTC
    2 points
    2
    Parent
    Which is somehow worse than doing it for corporate reasons.
- Vladimir_Nesov 23 Mar 2023 17:34 UTC
  8 points
  1
  Parent
  
  great care would have to be taken on alignment and safety per a system’s abilities to take autonomous actions in the world and to perform autonomous self-improvement via cycles of learning
  
  Not allowing cycles of learning sounds like a bound on capability, but it might be a bound on capability of the part of the system that’s aligned, without a corresponding bound on the part that might be misaligned.
  
  GPT-4 can do a lot of impresive things without thinking out loud with tokens in the context window, so where does this thinking take place? Probably with layers updating the residual stream. There are enough layers now that a sequence of their application might be taking on the role of context window to perform chain-of-thought reasoning, which is non-interpretable and not imitating human speech. This capability is being trained during pre-training, as the model is forced to read the dataset.
  
  But the corresponding capability for studying deliberative reasoning in tokens is not being trained. The closest thing to it in GPT-4 is mitigation of hallucinations (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report), and it’s nowhere near general enough.
  
  This way, the inscrutable alien shoggoth is on track to wake up, while human-imitating masks that are plausibly aligned by default are being held back in situationally unaware confusion in the name of restricting capabilities for the sake of not burning the timeline.
- Daniel Paleka 23 Mar 2023 15:29 UTC
  8 points
  6
  Parent
  I expected downvotes (it is cheeky and maybe not great for fruitful discussion), but instead I got disagreevotes. Big company labs do review papers for statements that could hurt the company! It’s not a conspiracy theory to suggest this shaped the content in some ways, especially the risks section.