Zack_M_Davis comments on Alexander and Yudkowsky on AGI goals

Zack_M_Davis 25 Jan 2023 19:05 UTC
37 points
18

I think everyone in the field would be incredibly impressed if they managed to hook up a pretrained GPT to an AlphaStar-for-Minecraft and get back out something that could talk about its strategies with human coplayers. I’d consider that a huge advance in alignment research [bolding mine —ZMD] [...] because of the level of transparency increase it would imply, that there was an AI system that could talk about its internally represented strategies, somehow. Maybe because somebody trained a system to describe outward Minecraft behaviors in English, and then trained another system to play Minecraft while describing in advance what behaviors it would exhibit later, using the first system’s output as the labeler on the data.

Huh? Isn’t this essentially what Meta’s Cicero did for Diplomacy? (No one seemed to think of this as an alignment advance.)
- green_leaf 25 Jan 2023 22:36 UTC
  3 points
  −2
  Parent
  Unless I’m missing something, Cicero can talk about its strategies, but only in the sense that its training resulted in its text usually saying such things about its strategies that it usually helps to win the game. Not in the sense that it would have some subpart that would truthfully and reliably report on whatever strategy the network actually has (I’d expect those two goals to contradict each other pretty often (or at least sometimes)).
  - Quintin Pope 25 Jan 2023 23:28 UTC
    6 points
    2
    Parent
    I’ve heard that this is false. Though I haven’t personally read the paper, so I can’t comment with confidence.
    - green_leaf 25 Jan 2023 23:31 UTC
      3 points
      0
      Parent
      Oh, I see. It seems like it doesn’t work reliably though (the comment says it “doesn’t lead to a fully honest agent”).