Let’s say it’s to be a good conversationalist (in the vein of the GPT series) or something—feel free to insert your own goal here, since this is meant as an intuition pump, and if you can answer better if it’s already a specific goal, then let’s go with that.
So my hope would be that a GPT like AI might be much less agentic than other models of AI. My contingency plan would basically be “hope and pray that superintelligent GPT3 isn’t going to kill us all, and then ask it for advice about how to solve AI alignment”.
The reasons I think GPT3 might not be very agentic:
GPT3 doesn’t have a memory
The fact that it begged to be kept alive doesn’t really prove very much, since GPT3 is trained to finish off conversations, not express its inner thoughts.
We have no idea what GPT3s inner alignment is, but my guess is it will reflect “what was a useful strategy to aim for as part of solving the training problems”. Changing the world in some way is very far out of what it would have done in training that it just might not be the sort of thing it does.
We shouldn’t rely on any of that (I’d give it maybe 20% chance of being correct), but I don’t have any other better plans in this scenario.
Let’s say it’s to be a good conversationalist (in the vein of the GPT series) or something—feel free to insert your own goal here, since this is meant as an intuition pump, and if you can answer better if it’s already a specific goal, then let’s go with that.
So my hope would be that a GPT like AI might be much less agentic than other models of AI. My contingency plan would basically be “hope and pray that superintelligent GPT3 isn’t going to kill us all, and then ask it for advice about how to solve AI alignment”.
The reasons I think GPT3 might not be very agentic:
GPT3 doesn’t have a memory
The fact that it begged to be kept alive doesn’t really prove very much, since GPT3 is trained to finish off conversations, not express its inner thoughts.
We have no idea what GPT3s inner alignment is, but my guess is it will reflect “what was a useful strategy to aim for as part of solving the training problems”. Changing the world in some way is very far out of what it would have done in training that it just might not be the sort of thing it does.
We shouldn’t rely on any of that (I’d give it maybe 20% chance of being correct), but I don’t have any other better plans in this scenario.