Gurkenglas answers What to do with imitation humans, other than asking them what the right thing to do is?

Gurkenglas 28 Sep 2020 3:28 UTC
LW: 5 AF: 2
AF
It sounds like you want to use it as a component for alignment of a larger AI, which would somehow turn its natural-language directives into action. I say use it as the capability core: Ask it to do armchair alignment research. If we give it subjective time, a command line interface and internet access, I see no reason it would do worse than the rest of us.
What links here?
- Vanessa Kosoy's comment on What to do with imitation humans, other than asking them what the right thing to do is? by Charlie Steiner (28 Sep 2020 15:58 UTC; 4 points)
- Charlie Steiner 30 Sep 2020 12:24 UTC
  LW: 3 AF: 1
  AF Parent
  In retrospect, I was totally unclear that I wan’t necessarily talking about something that has a complicated internal state, such that it can behave like one human over long time scales. I was thinking more about the “minimum human-imitating unit” necessary to get things like IDA off the ground.
  
  In fact this post was originally titled “What to do with a GAN of a human?”
  - Gurkenglas 30 Sep 2020 14:05 UTC
    2 points
    Parent
    I don’t think you need a complicated internal state to do research. You just need to have read enough research and math to have a good intuition for what definitions, theorems and lemmas will be useful. When I try to come up with insights, my short-term memory context would easily fit into GPT-3′s window.
    - Charlie Steiner 30 Sep 2020 23:54 UTC
      2 points
      Parent
      I feel like my state is significantly more complicated than that. I smoothly accumulate short-term memory and package some of it away into long-term memory, which even more slowly gets packaged away into longer-term memory. GPT-3′s window size would run out the first time I tried to do a literature search and read a few papers, because it doesn’t form memories so easily.
      The way actual GPT-3 (or really anything with limited state but lots of training data, I think) gets around this sort of thing is by already having read those papers during training, plus lots of examples of people reacting to papers, and then using context to infer that it should output words that come from someone at a later stage of paper-reading.
      Do you foresee a different, more human-like model of humans becoming practical to train?
      - Gurkenglas 1 Oct 2020 13:13 UTC
        2 points
        Parent
        Misunderstanding: You are talking about literature research, which I do see as part of training. I am talking about original research, which at its best consists of prompts like “This oneliner construction from these four concepts can be elegantly modeled using the concept of ”. The results would of course be integrated into long-term memory using fine-tuning.