Neel Nanda comments on The Plan − 2024 Update

Neel Nanda 31 Dec 2024 20:30 UTC
12 points
2
Strong +1, that argument didn’t make sense to me. Images are a fucking mess—they’re a grid of RGB pixels, of a 3D environment (interpreted through the lens of a camera) from a specific angle. Text is so clean and pretty in comparison, and has much richer meaning, and has a much more natural mapping to concepts we understand
- kave 31 Dec 2024 21:53 UTC
  11 points
  5
  Parent
  That sounds less messy than the path from 3D physical world to tokens (and ~~less~~ (edit: I meant more here!) messy than the path from human concepts to tokens)
  - Neel Nanda 2 Jan 2025 12:11 UTC
    5 points
    0
    Parent
    Sure, but I think that human cognition tends to operate at a level of abstract above the configuration of atoms in a 3D environment. Like “that is a chair” is a useful way to reason about an environment. Whilethat “that is a configuration of pixels that corresponds to a chair when projected at a certain angle in certain lighting conditions” must first be converted to “that is a chair” before anything useful can be done. Text just has a lot of useful preprocessing applied already and is far more compressed
    - johnswentworth 2 Jan 2025 12:19 UTC
      8 points
      3
      Parent
      The preprocessing itself is one of the main important things we need to understand (I would even argue it’s the main important thing), if our interpretability methods are ever going to tell us about how the stuff-inside-the-net relates to the stuff-in-the-environment (which is what we actually care about).
    - kave 2 Jan 2025 19:09 UTC
      7 points
      2
      Parent
      I’m not sure I understand what you’re driving at, but as far as I do, here’s a response: I have lots of concepts and abstractions over the physical world (like chair). I don’t have many concepts or abstractions over strings of language, apart from as factored through the physical world. (I have some, like register or language, but they don’t actually feel that “final” as concepts).
      As far as factoring my predictions of language through the physical world, a lot of the simplest and most robust concepts I have are just nouns, so they’re already represented by tokenisation machinery, and I can’t do interesting interp to pick them out.