mtaran comments on Human Mimicry Mainly Works When We’re Already Close

mtaran 18 Aug 2022 18:10 UTC
6 points
2
Alas, querying counterfactual worlds is fundamentally not a thing one can do simply by prompting GPT.
Citation needed? There’s plenty of fiction to train on, and those works are set in counterfactual worlds. Similarly, historical, mistaken, etc. texts will not be talking about the Current True World. Sure right now the prompting required is a little janky, e.g.:

But this should improve with model size, improved prompting approaches or other techniques like creating optimized virtual prompt tokens.
And also, if you’re going to be asking the model for something far outside its training distribution like “a post from a researcher in 2050”, why not instead ask for “a post from a researcher who’s been working in a stable, research-friendly environment for 30 years”?
- johnswentworth 18 Aug 2022 18:21 UTC
  6 points
  3
  Parent
  Those works of fiction are all written by authors in our world. What we want is text written by someone who is not from our world. Not the text which someone writing on real-world Lesswrong today imagines someone in safer world would write in 2050, but the text which someone in a safer world would actually write in 2050.
  After all, those of us writing on Lesswrong today don’t actually know what someone in a safer world would write in 2050; that’s why simulating/predicting the researcher is useful in the first place.
  - mtaran 21 Aug 2022 18:22 UTC
    3 points
    0
    Parent
    My mental model here is something like the following:
    a GPT-type model is trained on a bunch of human-written text, written within many different contexts (real and fictional)
    it absorbs enough patterns from the training data to be able to complete a wide variety of prompts in ways that also look human-written, in part by being able to pick up on implications & likely context for said prompts and proceeding to generate text consistent with them
    Slightly rewritten, your point above is that:
    The training data is all written by authors in Context X. What we want is text written by someone who is from Context Y. Not the text which someone in Context X imagines someone in Context Y would write but the text which someone in Context Y would actually write.
    After all, those of us writing in Context X don’t actually know what someone in Context Y would write; that’s why simulating/predicting someone in Context Y is useful in the first place.
    If I understand the above correctly, the difference you’re referring to is the difference between:
    Fictional
    prompt = “A lesswrong post from a researcher in 2050:”
    GPT’s internal interpretation of context = “A fiction story, so better stick to tropes, plot structure, etc. coming from fiction”
    Non-fictional
    prompt = “A lesswrong post from a researcher in 2050:”
    GPT’s internal interpretation of context = “A lesswrong post (so factual/researchy, rather than fiction) from 2050 (so better extrapolate current trends, etc. to write about what would be realistic in 2050)”
    Similar things could be done re: the “stable, research-friendly environment”.
    The internal interpretation is not something we can specify directly, but I believe sufficient prompting would be able to get close enough. Is that the part you disagree with?
    - johnswentworth 21 Aug 2022 18:45 UTC
      2 points
      −1
      Parent
      The internal interpretation is not something we can specify directly, but I believe sufficient prompting would be able to get close enough. Is that the part you disagree with?
      Yup, that’s the part I disagree with.
      Prompting could potentially set GPT’s internal representation of context to “A lesswrong post from 2050”; the training distribution has lesswrong posts generated over a reasonably broad time-range, so it’s plausible that GPT could learn how the lesswrong-post-distribution changes over time and extrapolate that forward. What’s not plausible is the “stable, research-friendly environment” part, and more specifically the “world in which AGI is not going to take over in N years” part (assuming that AGI is in fact on track to take over our world; otherwise none of this matters anyway). The difference is that 100% of GPT’s training data is from our world; it has exactly zero variation which would cause it to learn what kind of writing is generated by worlds-in-which-AGI-is-not-going-to-take-over. There is no prompt which will cause it to generate writing from such a world, because there is no string such that writing in our world (and specifically in the training distribution) which follows that string is probably generated by a different world.
      (Actually, that’s slightly too strong a claim; there does exist such a string. It would involve a program specifying a simulation of some researchers in a safe environment. But there’s no such string which we can find without separately figuring out how to simulate/predict researchers in a safe environment without using GPT.)