Don’t despair; we’re going to solve this. ai safety has recently made a series of breakthroughs we expected to not have, and my intuitive sense of safety trajectory now thoroughly matches my capabilities expectations. The big capabilities teams are starting to really come around on what safety is all about. It’s understandable to be worried, but I think you should live your life as if the only challenge is climate change.
Don’t underestimate climate change, though. I’d hold off on having kids until it’s clear that the carbon transition will actually occur, which seems to me to depend on fusion. I’d give it another year or two, personally. Up to you, of course. But I think AI tech we’ll get under control and use it to push past the biochemistry and diplomacy challenges we need to solve to get the carbon back in the ocean and ground.
edit: I’ve gotten some karma downvotes—I’m not surprised almost no one agrees that we’re going to solve it, it wouldn’t feel like that from the inside right now; seems like the same kind of objection the researchers causing short capabilities timelines have to short capabilities timelines. but I’m curious if the karma downvotes imply I should have phrased this better or so, and if so, whether anyone would be willing to comment on how to improve my phrasing.
Discovering agents—lesswrong—made interesting progress on the fundamental definition of agency that seems very promising to me
people seem to be converging on a similar approach that engages directly and has clear promise
the alignment research field is accumulating velocity among traditional research groups, which seem likely to be much more effective than theoretical research. since the theoretical researchers seem to be agreeing with empirical research, this seems promising to me
I have semi-private views, argued sloppily here (may write a better presentation of this shortly) about what’s tractable, from previous middling-quality capabilities work I did with jacob_cannell, that lead me to believe we can guide models towards the grokkings we want more easily than it seems like we can with current models, and that work on formal verification will be able to plug into much larger models than it can now once the larger models can reach stronger grokking capability levels. I’ve argued this one in some places but I’d rather just let deepmind figure it out for themselves and work on what they’ll need in terms of objectives and verification tools when they figure out how to make more internally coherent models.
I’m very optimistic about the general “LOVE in a simbox is all you need” (review 1, review 2; both are less optimistic than me) approach once the core of alignment is working well enough. I suspect this can be improved significantly by nailing down co-empowerment and co-protection in terms of mutual information and mutual agency preservation. That is what I’m actually vaguely working on.
Don’t despair; we’re going to solve this. ai safety has recently made a series of breakthroughs we expected to not have, and my intuitive sense of safety trajectory now thoroughly matches my capabilities expectations. The big capabilities teams are starting to really come around on what safety is all about. It’s understandable to be worried, but I think you should live your life as if the only challenge is climate change.
Don’t underestimate climate change, though. I’d hold off on having kids until it’s clear that the carbon transition will actually occur, which seems to me to depend on fusion. I’d give it another year or two, personally. Up to you, of course. But I think AI tech we’ll get under control and use it to push past the biochemistry and diplomacy challenges we need to solve to get the carbon back in the ocean and ground.
edit: I’ve gotten some karma downvotes—I’m not surprised almost no one agrees that we’re going to solve it, it wouldn’t feel like that from the inside right now; seems like the same kind of objection the researchers causing short capabilities timelines have to short capabilities timelines. but I’m curious if the karma downvotes imply I should have phrased this better or so, and if so, whether anyone would be willing to comment on how to improve my phrasing.
Please say more!
Discovering agents—lesswrong—made interesting progress on the fundamental definition of agency that seems very promising to me
people seem to be converging on a similar approach that engages directly and has clear promise
the alignment research field is accumulating velocity among traditional research groups, which seem likely to be much more effective than theoretical research. since the theoretical researchers seem to be agreeing with empirical research, this seems promising to me
I have semi-private views, argued sloppily here (may write a better presentation of this shortly) about what’s tractable, from previous middling-quality capabilities work I did with jacob_cannell, that lead me to believe we can guide models towards the grokkings we want more easily than it seems like we can with current models, and that work on formal verification will be able to plug into much larger models than it can now once the larger models can reach stronger grokking capability levels. I’ve argued this one in some places but I’d rather just let deepmind figure it out for themselves and work on what they’ll need in terms of objectives and verification tools when they figure out how to make more internally coherent models.
I’m very optimistic about the general “LOVE in a simbox is all you need” (review 1, review 2; both are less optimistic than me) approach once the core of alignment is working well enough. I suspect this can be improved significantly by nailing down co-empowerment and co-protection in terms of mutual information and mutual agency preservation. That is what I’m actually vaguely working on.