My mental model here is something like the following:
a GPT-type model is trained on a bunch of human-written text, written within many different contexts (real and fictional)
it absorbs enough patterns from the training data to be able to complete a wide variety of prompts in ways that also look human-written, in part by being able to pick up on implications & likely context for said prompts and proceeding to generate text consistent with them
Slightly rewritten, your point above is that:
The training data is all written by authors in Context X. What we want is text written by someone who is from Context Y. Not the text which someone in Context X imagines someone in Context Y would write but the text which someone in Context Y would actually write.
After all, those of us writing in Context X don’t actually know what someone in Context Y would write; that’s why simulating/predicting someone in Context Y is useful in the first place.
If I understand the above correctly, the difference you’re referring to is the difference between:
Fictional
prompt = “A lesswrong post from a researcher in 2050:”
GPT’s internal interpretation of context = “A fiction story, so better stick to tropes, plot structure, etc. coming from fiction”
Non-fictional
prompt = “A lesswrong post from a researcher in 2050:”
GPT’s internal interpretation of context = “A lesswrong post (so factual/researchy, rather than fiction) from 2050 (so better extrapolate current trends, etc. to write about what would be realistic in 2050)”
Similar things could be done re: the “stable, research-friendly environment”.
The internal interpretation is not something we can specify directly, but I believe sufficient prompting would be able to get close enough. Is that the part you disagree with?
The internal interpretation is not something we can specify directly, but I believe sufficient prompting would be able to get close enough. Is that the part you disagree with?
Yup, that’s the part I disagree with.
Prompting could potentially set GPT’s internal representation of context to “A lesswrong post from 2050”; the training distribution has lesswrong posts generated over a reasonably broad time-range, so it’s plausible that GPT could learn how the lesswrong-post-distribution changes over time and extrapolate that forward. What’s not plausible is the “stable, research-friendly environment” part, and more specifically the “world in which AGI is not going to take over in N years” part (assuming that AGI is in fact on track to take over our world; otherwise none of this matters anyway). The difference is that 100% of GPT’s training data is from our world; it has exactly zero variation which would cause it to learn what kind of writing is generated by worlds-in-which-AGI-is-not-going-to-take-over. There is no prompt which will cause it to generate writing from such a world, because there is no string such that writing in our world (and specifically in the training distribution) which follows that string is probably generated by a different world.
(Actually, that’s slightly too strong a claim; there does exist such a string. It would involve a program specifying a simulation of some researchers in a safe environment. But there’s no such string which we can find without separately figuring out how to simulate/predict researchers in a safe environment without using GPT.)
My mental model here is something like the following:
a GPT-type model is trained on a bunch of human-written text, written within many different contexts (real and fictional)
it absorbs enough patterns from the training data to be able to complete a wide variety of prompts in ways that also look human-written, in part by being able to pick up on implications & likely context for said prompts and proceeding to generate text consistent with them
Slightly rewritten, your point above is that:
If I understand the above correctly, the difference you’re referring to is the difference between:
Fictional
prompt = “A lesswrong post from a researcher in 2050:”
GPT’s internal interpretation of context = “A fiction story, so better stick to tropes, plot structure, etc. coming from fiction”
Non-fictional
prompt = “A lesswrong post from a researcher in 2050:”
GPT’s internal interpretation of context = “A lesswrong post (so factual/researchy, rather than fiction) from 2050 (so better extrapolate current trends, etc. to write about what would be realistic in 2050)”
Similar things could be done re: the “stable, research-friendly environment”.
The internal interpretation is not something we can specify directly, but I believe sufficient prompting would be able to get close enough. Is that the part you disagree with?
Yup, that’s the part I disagree with.
Prompting could potentially set GPT’s internal representation of context to “A lesswrong post from 2050”; the training distribution has lesswrong posts generated over a reasonably broad time-range, so it’s plausible that GPT could learn how the lesswrong-post-distribution changes over time and extrapolate that forward. What’s not plausible is the “stable, research-friendly environment” part, and more specifically the “world in which AGI is not going to take over in N years” part (assuming that AGI is in fact on track to take over our world; otherwise none of this matters anyway). The difference is that 100% of GPT’s training data is from our world; it has exactly zero variation which would cause it to learn what kind of writing is generated by worlds-in-which-AGI-is-not-going-to-take-over. There is no prompt which will cause it to generate writing from such a world, because there is no string such that writing in our world (and specifically in the training distribution) which follows that string is probably generated by a different world.
(Actually, that’s slightly too strong a claim; there does exist such a string. It would involve a program specifying a simulation of some researchers in a safe environment. But there’s no such string which we can find without separately figuring out how to simulate/predict researchers in a safe environment without using GPT.)