As far as I understand GPT-N it’s not very agent-like (it doesn’t perform me vs environment abstraction and doesn’t look for ways to transform its perceived environment to satisfy some utility function). I wouldn’t expect it to “scheme” against people since it lacks any concept of “affecting its environment”.
However it seems likely that GTP-N can perfect the skill of crowd-pleasing (we already see that; we’re constantly amazed by it, despite little meaning of created texts). It can precisely modulate it’s tone and identify the talking points that get the most response.
So I expect the GTP-N generated texts to sound really persuasive, not because of novel ideas but because of superhuman ability to compose heard ideas into persuasive essay.
I would expect GTP-N to focus on presenting solutions for alignement (therefore making us overly optimistic about naive approaches), presenting novel risks (it’s easy to make something up by simple rehashing) and possibly venturing in philosophical muddling the water (humans prove to be very easily engaged by certain topics, like self-consciousness)
we already see that; we’re constantly amazed by it, despite little meaning of created texts
But GPT-3 is only trained to minimize prediction loss, not to maximize response. GPT-N may be able to crowd-please if it’s trained on approval, but I don’t think that’s what’s currently happening.
Upon reflection, you’re right that it won’t be maximizing response per se.
But as we get deeper it’s not so straightforward. GTP-3 models can be trained to minimize prediction loss (or, plainly speaking, to simply predict more accurately) on many different tasks, which usually are very simply stated (eg. choose a word that would fill the blank).
But we end up with people taking models trained thusly and use them to generate a long texts based on some primer. And yes, in most cases such abuse of the model will end up with text that is simply coherent. But I would expect humans to have a tendency to conflate coherence and persuasiveness.
I suppose one can fairly easily choose such prediction loss for GTP-3 models that the longer texts would have some desired characteristics. But also even standard tasks probably shape GTP-3 so that it would keep producing vague sentences that continue the primer and that give the reader a feel of “it making sense”. That would entail possibly producing fairly persuasive texts reinforcing primer thesis.
As far as I understand GPT-N it’s not very agent-like (it doesn’t perform me vs environment abstraction and doesn’t look for ways to transform its perceived environment to satisfy some utility function). I wouldn’t expect it to “scheme” against people since it lacks any concept of “affecting its environment”.
However it seems likely that GTP-N can perfect the skill of crowd-pleasing (we already see that; we’re constantly amazed by it, despite little meaning of created texts). It can precisely modulate it’s tone and identify the talking points that get the most response.
So I expect the GTP-N generated texts to sound really persuasive, not because of novel ideas but because of superhuman ability to compose heard ideas into persuasive essay.
I would expect GTP-N to focus on presenting solutions for alignement (therefore making us overly optimistic about naive approaches), presenting novel risks (it’s easy to make something up by simple rehashing) and possibly venturing in philosophical muddling the water (humans prove to be very easily engaged by certain topics, like self-consciousness)
But GPT-3 is only trained to minimize prediction loss, not to maximize response. GPT-N may be able to crowd-please if it’s trained on approval, but I don’t think that’s what’s currently happening.
Upon reflection, you’re right that it won’t be maximizing response per se.
But as we get deeper it’s not so straightforward. GTP-3 models can be trained to minimize prediction loss (or, plainly speaking, to simply predict more accurately) on many different tasks, which usually are very simply stated (eg. choose a word that would fill the blank).
But we end up with people taking models trained thusly and use them to generate a long texts based on some primer. And yes, in most cases such abuse of the model will end up with text that is simply coherent. But I would expect humans to have a tendency to conflate coherence and persuasiveness.
I suppose one can fairly easily choose such prediction loss for GTP-3 models that the longer texts would have some desired characteristics. But also even standard tasks probably shape GTP-3 so that it would keep producing vague sentences that continue the primer and that give the reader a feel of “it making sense”. That would entail possibly producing fairly persuasive texts reinforcing primer thesis.