Alex Lawsen comments on Prize for Alignment Research Tasks

Alex Lawsen 29 Apr 2022 11:59 UTC
18 points
Task: Identify key background knowledge required to understand a concept
- Context: Many people are currently self-directing their learning in order to eventually be able to useful contribute to alignment research. Even among experienced researchers, people will sometimes come across concepts that require background they don’t have in order to understand. By ‘key’ background content, I’m imagining that the things which get identified are ‘one step back’ in the chain, or something like ‘the required background concepts which themselves require the most background’. This seems like the best way of making the tool useful, if the background concepts generated are themselves not understood by the user, they can just use the tool again on those concepts.
- Input type: A paper (with the idea that part of the task is to identify the highest level concepts in the paper). It would also be reasonable to just have the name of a concept, with a separate task of ‘generate the highest level concept’.
- Output type: At minimum, a list of concepts which are key background. Better would be a list of these concepts plus summaries of papers/textbooks/wikipedia entries which explain them.
- Info considerations: This system is not biased towards alignment over capabilities, though I think it will in practice help alignment work more than capabilities work, due to the former being less well-served by mainstream educational material and courses. This does mean that having scraped LW and the alignment forum, alignment-relevant things on ArXiv, MIRI’s site etc. would be particularly useful
I don’t have capacity today to generate instances, though I plan to come back and do so. I’m happy to share credit if someone else jumps in first and does so though!
- Alex Lawsen 30 Apr 2022 14:48 UTC
  9 points
  Parent
  The ideal version of the task is decomposable into:
  - find the high level concepts in a paper (high level here meaning ‘high level background required’)
  - From a concept, generate the highest level prerequisite concepts
  - For a given concept, generate a definition/explanation (either by finding and summarising a paper/article, or just directly producing one)
  The last of these tasks seems very similar to a few things Elicit is already doing or at least trying to do, so I’ll generate instances of the other two.
  
  Identify some high-level concepts in a paper
  Example 1
  Input: This post by Nuno Sempere
  Output: Suggestions for high level concepts
  - Counterfactual impact
  - Shapley Value
  - Funging
  - Leverage
  - Computability
  Notes: In one sense the ‘obvious’ best suggestion for the above post is ‘Shapley value’, given that’s what the post is about, and it’s therefore the most central concept one might want to generate background on. I think I’d ~~be fine with~~ probably prefer the output above though, where there’s some list of <10 concepts. In a model which had some internal representation of the entirety of human knowledge, and purely selected the single thing with the most precursors, my (very uncertain) guess is that computability might be the single output produced, even though it’s non-central to the post and only appears in a footnote. That’s part of the reason why I’d be relatively happy for the output of this first task to roughly be ‘complicated vocabulary which gets used in the paper’
  
  Example 2
  
  Input: Eliciting Latent Knowledge by Mark Xu and Paul Christiano
  
  Output: Suggestions for high level concepts
  - Latent Knowledge
  - Ontology
  - Bayesian Network
  - Imitative Generalisation
  - Regularisation
  - Indirect Normativity
  Notes: This is actually a list of terms I noted down as I was reading the paper, so rather than ‘highest level’ it’s just ‘what Alex happened to think it was worth looking up’, but for illustrative purposes I think it’s fine.
  
  Having been given a high-level concept, generate prerequisite concepts
  Notes: I noticed when trying to generate background concepts here that in order to do so it was most useful to have the context of the post. This pushed me in the direction of thinking these concepts were harder to fully decompose than I had thought, and suggested that the input might need to be ‘[concept], as used in [paper]‘, rather than just [concept]. All of the examples below come from the examples above. In some cases, I’ve indicated what I expect a second layer of recursion might produce, though it seems possible that one might just want the model to recurse one or more times by default.
  
  I found the process of generating examples really difficult, and am not happy with them. I notice that what I kept wanting to do was write down ‘high-level’ concepts. Understanding the entirety of a few high-level concepts is often close to sufficient to understand an idea, but it’s not usually necessary. With a smooth recursion UX (maybe by clicking), I think the ideal output almost invariably generates low-mid level concepts with the first few clicks. The advantages of this are that if the user recognises a concept they know they are done with that branch, and narrower concepts are easier to generate definitions for without recursing. Unfortunately, sometimes there are high level prerequisites which aren’t obviously going to be generated by recursing on the lower level ones. I don’ have a good solution to this yet.
  
  Input: Shapley Value
  Output:
  - Expected value
    Weighted average
    Elementary probability
    Utility
  - Marginal contribution
    Payoff
    Agent
    Fixed cost
    Variable cost
    Utility
  Input: Computability
  Output:
  - Computational problem
  - Computable function
  - Turing Machine
  - Computational complexity
  Notes: I started recursing, quickly confirmed my hypothesis from earlier about this being by miles the thing with the most prerequisites, and deleted everything except what I had for ‘level 1’, which I also left unfinished before I got completely lost down a rabbithole.
  
  Input: Bayesian Network
  Output:
  - Probabilistic inference
    Bayes’ Theorem
    Probability distribution
  - Directed Acyclic Graph
    Directed Graph
    Graph (Discrete Mathematics)
    Vertex
    Edge
    Cycle
    Trail
    Graph (Discrete Mathematics)
    Vertex
    Edge
  Notes: Added a few more layers of recursion to demonstrate both that you probably want some kind of dynamic tree structure, and also also that not every prerequisite is equally ‘high level’.
  
  Conclusions from trying to generate examples
  
  This is a much harder, but much more interesting, problem than I’d originally expected. Which prerequisites seem most important, how narrowly to define them, and how much to second guess myself, all ended up feeling pretty intractable. I may try with some (much) simpler examples later, rather than trying to generate them from papers I legitimately found interesting. If a LLM is able to generalise the idea of ‘necessary prerequisites’ from easier concepts to harder ones, this itself seems extremely interesting and valuable.
  - William_S 16 May 2022 4:20 UTC
    2 points
    Parent
    Seems like a reasonable task, but wonder if it would be easier in practice to just have a wiki or something like https://metacademy.org/ or get post authors to do this (mostly depends on the size of the graph of concepts you need to connect, if it’s smaller makes sense for people to do it, if it’s larger then maybe automation helps).
    - Alex Lawsen 17 May 2022 21:07 UTC
      1 point
      Parent
      I think both of those would probably help but expect that the concept graph is very big, especially if you want people to be able to use the process recursively.
      
      There’s also value in the workflow being smooth, and this task is sandwiched between two things which seem very useful (and quite straightforward) to automate with an LLM:
      
      concept extraction
      search for and summarise explainer papers/articles
      
      I can however imagine a good wiki with great graph style UX navigation and expandable definitions/paper links solving the last two problems, with then only concept extraction being automated by Elicit, though even in this case initially populating the graph/wiki might be best done using automation of the type described above. It’s much easier to maintain something which already exists.