are there any alignment approaches that we could try out on GPT-3 in simplified form?
For a start you could see how it predicts or extrapolates moral reasoning. The datasets I’ve seen for that are “moral machines” and ‘am I the arsehole’ on reddit.
EDIT Something like this was just released Aligning AI With Shared Human Values
For a start you could see how it predicts or extrapolates moral reasoning. The datasets I’ve seen for that are “moral machines” and ‘am I the arsehole’ on reddit.
EDIT Something like this was just released Aligning AI With Shared Human Values