I wanted to give it a shot and made GPT4 to deceive the user: link.
When you delete that system prompt it stops deceiving.
But GPT had to be explicitly instructed to disobey the Party. I wonder if it could be done more subtly.
I wanted to give it a shot and made GPT4 to deceive the user: link.
When you delete that system prompt it stops deceiving.
But GPT had to be explicitly instructed to disobey the Party. I wonder if it could be done more subtly.