The link for the github repo is broken, it includes the comma at the end.
Ollie J
[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
ChatGPT banned in Italy over privacy concerns
Whisper’s Wild Implications
I wonder how it would update its strategies if you negotiated in an unorthodox way:
“If you help me win, I will donate £5000 across various high-impact charities”
“If you don’t help me win, I will kill somebody”
There exist many articles like this littered throughout the internet, where authors perform surface-level analysis and ask GPT-3 some question (usually basic arithmetic), then point at the wrong answer and make some conclusion (“GPT-3 is clueless”). They almost never state the parameters of the used model or give the whole input prompt.
GPT-3 is very capable of saying “I don’t know” (or “yo be real”), but due to its training dataset it likely won’t say it on its own accord.
GPT-3 is not an oracle or some other kind of agent. GPT-3 is a simulator of such agents. To get GPT-3 to act as a truthful oracle, explicit instruction must be given in the input prompt to do so.
I’m positive that as these language models become more accessible and powerful, their misuse will grow massively. However, I believe open sourcing is the best option here; having access to such model allows us to create accurate automatic classifiers that detect outputs from such models. Media websites (e.g. Wikipedia, Twitter) could include this classifier in their pipeline for submitting new media.
Making such technologies closed source leaves researchers in the dark; due to the scaling-transformer hype, only a tiny fraction of the world’s population have the financial means to train a SOTA transformer model.
Fixed, thanks for flagging