I’d expect a more promising approach toward capability amplification to be focused on actual human behavior, not on explicit human feedback. Humans are notoriously bad at explaining real reasons for why we do what we do, so accepting their words as quality feedback seems counterproductive. The feedback need not be ignored, but treated as just another source of information, just like lies and misguided ideas are a source of information about the person expressing them. A reward function would not be anything explicit, but a sort of a Turing test, (Pinocchio test?): fitting in and being implicitly recognized as a fellow human. That’s how real humans learn, and seems like a promising way to start, at least in some constrained environment with reasonably clear behavioral boundaries and expectations.
Humans are notoriously bad at explaining real reasons for why we do what we do, so accepting their words as quality feedback seems counterproductive. The feedback need not be ignored, but treated as just another source of information, just like lies and misguided ideas are a source of information about the person expressing them.
Agreed, but the hard question seems to be how you interpret that feedback, given that you can’t interpret it literally.
A reward function would not be anything explicit, but a sort of a Turing test, (Pinocchio test?): fitting in and being implicitly recognized as a fellow human.
I’d expect a more promising approach toward capability amplification to be focused on actual human behavior, not on explicit human feedback. Humans are notoriously bad at explaining real reasons for why we do what we do, so accepting their words as quality feedback seems counterproductive. The feedback need not be ignored, but treated as just another source of information, just like lies and misguided ideas are a source of information about the person expressing them. A reward function would not be anything explicit, but a sort of a Turing test, (Pinocchio test?): fitting in and being implicitly recognized as a fellow human. That’s how real humans learn, and seems like a promising way to start, at least in some constrained environment with reasonably clear behavioral boundaries and expectations.
Agreed, but the hard question seems to be how you interpret that feedback, given that you can’t interpret it literally.
Fyi, this sounds like imitation learning.