Discovering agents—lesswrong—made interesting progress on the fundamental definition of agency that seems very promising to me
people seem to be converging on a similar approach that engages directly and has clear promise
the alignment research field is accumulating velocity among traditional research groups, which seem likely to be much more effective than theoretical research. since the theoretical researchers seem to be agreeing with empirical research, this seems promising to me
I have semi-private views, argued sloppily here (may write a better presentation of this shortly) about what’s tractable, from previous middling-quality capabilities work I did with jacob_cannell, that lead me to believe we can guide models towards the grokkings we want more easily than it seems like we can with current models, and that work on formal verification will be able to plug into much larger models than it can now once the larger models can reach stronger grokking capability levels. I’ve argued this one in some places but I’d rather just let deepmind figure it out for themselves and work on what they’ll need in terms of objectives and verification tools when they figure out how to make more internally coherent models.
I’m very optimistic about the general “LOVE in a simbox is all you need” (review 1, review 2; both are less optimistic than me) approach once the core of alignment is working well enough. I suspect this can be improved significantly by nailing down co-empowerment and co-protection in terms of mutual information and mutual agency preservation. That is what I’m actually vaguely working on.
Please say more!
Discovering agents—lesswrong—made interesting progress on the fundamental definition of agency that seems very promising to me
people seem to be converging on a similar approach that engages directly and has clear promise
the alignment research field is accumulating velocity among traditional research groups, which seem likely to be much more effective than theoretical research. since the theoretical researchers seem to be agreeing with empirical research, this seems promising to me
I have semi-private views, argued sloppily here (may write a better presentation of this shortly) about what’s tractable, from previous middling-quality capabilities work I did with jacob_cannell, that lead me to believe we can guide models towards the grokkings we want more easily than it seems like we can with current models, and that work on formal verification will be able to plug into much larger models than it can now once the larger models can reach stronger grokking capability levels. I’ve argued this one in some places but I’d rather just let deepmind figure it out for themselves and work on what they’ll need in terms of objectives and verification tools when they figure out how to make more internally coherent models.
I’m very optimistic about the general “LOVE in a simbox is all you need” (review 1, review 2; both are less optimistic than me) approach once the core of alignment is working well enough. I suspect this can be improved significantly by nailing down co-empowerment and co-protection in terms of mutual information and mutual agency preservation. That is what I’m actually vaguely working on.