johnswentworth comments on The Field of AI Alignment: A Postmortem, and What To Do About It

johnswentworth 30 Dec 2024 22:19 UTC
13 points
4
I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess.
To be clear, I wasn’t talking about physics postdocs mainly because of raw g. Raw g is a necessary element, and physics postdocs are pretty heavily loaded on it, but I was talking about physics postdocs mostly because of the large volume of applied math tools they have.
The usual way that someone sees footholds on the hard parts of alignment is to have a broad enough technical background that they can see some analogy to something they know about, and try borrowing tools that work on that other thing. Thus the importance of a large volume of technical knowledge.
- JuliaHP 31 Dec 2024 12:43 UTC
  6 points
  2
  Parent
  Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)
  
  ”Broad technical knowledge” should be in some sense the “easiest” (not in terms of time-investment, but in terms of predictable outcomes), by reading lots of textbooks (using similar material as your study guide).
  
  Writing/communication, while more vague, should also be learnable by just writing a lot of things, publishing them on the internet for feedback, reflecting on your process etc.
  
  Something like “solving novel problems” seems like a much “harder” one. I don’t know if this is a skill with a simple “core” or a grab-bag of tactics. Textbook problems take on a “meant-to-be-solved” flavor and I find one can be very good at solving these without being good at tackling novel problems. Another thing I notice is that when some people (myself included) try solving novel problems, we can end up on a path which gets there eventually, but if given “correct” feedback integration would go OOM faster.
  
  I’m sure there are other vague-skills which one ends up picking up from a physics PhD. Can you name others, and how one picks them up intentionally? Am I asking the wrong question?
  - johnswentworth 31 Dec 2024 14:39 UTC
    14 points
    2
    Parent
    I currently think broad technical knowledge is the main requisite, and I think self-study can suffice for the large majority of that in principle. The main failure mode I see would-be autodidacts run into is motivation, but if you can stay motivated then there’s plenty of study materials.
    For practice solving novel problems, just picking some interesting problems (preferably not AI) and working on them for a while is a fine way to practice.
    - Johannes C. Mayer 31 Dec 2024 17:09 UTC
      8 points
      0
      Parent
      Why not AI? Is it that AI alignment is too hard? Or do you think it’s likely one would fall into the “try a bunch of random stuff” paradigm popular in AI, which wouldn’t help much in getting better at solving hard problems?
      
      What do you think about the strategy of instead of learning a textbook e.g. on information theory, or compilers you try to write the textbook and only look at existing material if you are really stuck. That’s my primary learning strategy.
      
      It’s very slow and I probably do it too much, but it allows me to train to solve hard problems that aren’t super hard. If you read all the text books all the practice problems remaining are very hard.
    - JuliaHP 31 Dec 2024 16:29 UTC
      7 points
      0
      Parent
      (That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn’t come across from the post.)