“This is what it looks like in practice, by default, when someone tries to outsource some cognitive labor which they could not themselves perform.” This proves way too much.
I agree, I think this even proves P=NP.
Maybe a more reasonable statement would be: You can not outsource cognitive labor if you don’t know how to verify the solution. But I think that’s still not completely true, given that interactive proofs are a thing. (Plug: I wrote a post exploring the idea of applying interactive proofs to AI safety.)
I think the standard setups in computational complexity theory assume away the problems which are usually most often blockers to outsourcing in practice—i.e. in complexity theory the problem is always formally specified, there’s no question of “does the spec actually match what we want?” or “has what we want been communicated successfully, or miscommunicated?”.
I think I mostly agree with this, but from my perspective it hints that you’re framing the problem slightly wrong. Roughly, the problem with the outsourcing-approaches is our inability to specify/verify solutions to the alignment problem, not that specifying is not in general easier than solving yourself.
(Because of the difficulty of specifying the alignment problem, I restricted myself to speculating about pivotal acts in the post linked above.)
Fair. I am fairly confident that (1) the video at the start of the post is pointing to a real and ubiquitous phenomenon, and (2) attempts to outsource alignment research to AI look like an extremely central example of a situation where that phenomenon will occur. I’m less confident that my models here properly frame/capture the gears of the phenomenon.
I agree, I think this even proves P=NP.
Maybe a more reasonable statement would be: You can not outsource cognitive labor if you don’t know how to verify the solution. But I think that’s still not completely true, given that interactive proofs are a thing. (Plug: I wrote a post exploring the idea of applying interactive proofs to AI safety.)
I think the standard setups in computational complexity theory assume away the problems which are usually most often blockers to outsourcing in practice—i.e. in complexity theory the problem is always formally specified, there’s no question of “does the spec actually match what we want?” or “has what we want been communicated successfully, or miscommunicated?”.
I think I mostly agree with this, but from my perspective it hints that you’re framing the problem slightly wrong. Roughly, the problem with the outsourcing-approaches is our inability to specify/verify solutions to the alignment problem, not that specifying is not in general easier than solving yourself.
(Because of the difficulty of specifying the alignment problem, I restricted myself to speculating about pivotal acts in the post linked above.)
Fair. I am fairly confident that (1) the video at the start of the post is pointing to a real and ubiquitous phenomenon, and (2) attempts to outsource alignment research to AI look like an extremely central example of a situation where that phenomenon will occur. I’m less confident that my models here properly frame/capture the gears of the phenomenon.