paulfchristiano comments on Writeup: Progress on AI Safety via Debate

paulfchristiano 20 Feb 2020 2:37 UTC
LW: 7 AF: 5
AF
The intuitive idea is to share activations as well as weights, i.e. to have two heads (or more realistically one head consulted twice) on top of the same model. There is a fair amount of uncertainty about this kind of “detail” but I think for now it’s smaller than the fundamental uncertainty about whether anything in this vague direction will work.