sludgepuddle comments on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

sludgepuddle 4 Jul 2023 6:45 UTC
6 points
0
This seems to me the opposite of a low bandwidth recursion. Having access the the entire context window of the previous iteration minus the first token, it should be pretty obvious that most of the relevant information encoded by the values of the nodes in that iteration could in principal be reconstructed, excepting the unlikely event that first token turns out to be extremely important. And it would be pretty weird if much if that information wasn’t actually reconstructed in some sense in the current iteration. An inefficient way to get information from one iteration to the next, if that is your only goal, but plausibly very high bandwidth.
- FeepingCreature 4 Jul 2023 7:40 UTC
  9 points
  2
  Parent
  
  excepting the unlikely event that first token turns out to be extremely important.
  
  Which is why asking an LLM to give an answer that starts with “Yes” or “No” and then gives an explanation is the worst possible way to do it.
  - der 5 Jul 2023 23:01 UTC
    5 points
    2
    Parent
    This was thought provoking. While I believe what you said is currently true for the LLMs I’ve used, a sufficiently expensive decoding strategy would overcome it. Might be neat to try this for the specific case you describe. Ask it a question that it would answer correctly with a good prompt style, but use the bad prompt style (asking to give an answer that starts with Yes or No), and watch how the ratio of the cumulative probabilities of Yes* and No* sequences changes as you explore the token sequence tree.
- dr_s 4 Jul 2023 10:50 UTC
  6 points
  2
  Parent
  I’d say it’s pretty low bandwidth compared to the wealth of information that must exist in the intermediate layers. Even just the distribution of logits gets collapsed into a single returned value. You could definitely send back more than just that, but the question is whether it’s workable or if it just adds confusion.