Vanessa Kosoy comments on Human beats SOTA Go AI by learning an adversarial policy

Vanessa Kosoy 20 Feb 2023 9:47 UTC
2 points
0
“Language Models are Few-Shot Learners” is some evidence towards the hypothesis that sample efficiency can be solved by metalearning, but the evidence is not that strong IMO. In order for it to be a strong counterargument to this example, it should be the case that an LLM can learn to play Go on superhuman level while also gaining the ability to recover from adversarial attacks quickly. In reality, I don’t think an LLM can learn to play Go decently at all (in deployment time, without fine-tuning on a large corpus of Go games). Even if we successfully fine-tuned it to imitate strong human Go players, I suspect that it would be at least just as vulnerable to adversarial examples, probably much more vulnerable.

Deep double descent certainly shows increasing the model size increases performance, but I think that even with optimal model size the sample efficiency is still atrocious.

As to EfficientZero, I tend to agree with your commentary, and I suspect similar methods would fail for environments that are much more complex than Atari (especially environments that are more computationally expensive to simulate than the compute available for the training algorithm).
- Steven Byrnes 20 Feb 2023 15:32 UTC
  5 points
  1
  Parent
  Just to make sure we’re on the same page, Fig. 4.1 was about training the model by gradient descent, not in-context learning. I’m generally somewhat averse to the term “in-context learning” in the first place; I’m skeptical that we should think of it as “learning” at all (as opposed to, say, “pointing to a certain task”). I wish people would reserve the term “learning” for the weight updates (when we’re talking about LLMs), at least in the absence of more careful justification than what I’ve seen.
  In particular, instead of titling the paper “Language Models are Few-Shot Learners”, I wish they had titled it “Language Models Can Do Lots of Impressive Things Without Fine-Tuning”.
  But Fig. 4.1 of that paper is definitely about actual learning.
  In order for it to be a strong counterargument to this example, it should be the case that an LLM can learn to play Go on superhuman level …
  I think there are so many disanalogies between LLMs-playing-Go and humans-playing-Go that it’s not even worth thinking about. ¯\_(ツ)_/¯ For example, humans can “visualize” things but LLMs can’t (probably). But OTOH, maybe future multi-modal next-gen LLMs will be able to.
  More generally, I haven’t seen any simple comparison that provides air-tight evidence either way on sample-efficiency of deep learning versus human brains (and “deep learning” is itself is big tent—presumably some model types & sizes are more sample-efficient than others).
  As it happens, I do believe that human brains are more sample efficient than any deep learning model. But my reasons for believing that are pretty indirect and I don’t want to talk about them.
  What links here?
  - Steven Byrnes's comment on Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data by Johannes Treutlein (23 Jun 2024 2:26 UTC; 13 points)