Saints vs. Schemers vs. Sycophants as different kinds of trained models / policies we might get. (I’m drawing from Ajeya’s post here).
There are more academic-sounding terms for these concepts too, I forget where, probably in Paul’s posts about “the intended model” vs. “the instrumental policy” and stuff like that.
Saints vs. Schemers vs. Sycophants as different kinds of trained models / policies we might get. (I’m drawing from Ajeya’s post here).
There are more academic-sounding terms for these concepts too, I forget where, probably in Paul’s posts about “the intended model” vs. “the instrumental policy” and stuff like that.