āweāve done so little work on alignment that I think it might actually be more like additive, from 1% to 26% or 50% to 75% with ten extra years relative to the real current odds if we press aheadāwhich nobody knows.ā
šš¤£ I really want āWeāve done so little work the probabilities are additiveā to be a meme. I feel like I do get where youāre coming from.
I agree about pause concern. I also really feel that any delay to friendly SI represents an enormous amount of suffering that could be prevented if we got to friendly SI sooner. It should not be taken lightly. And being realistic about how difficult it is to align humans seems worthwhile. When I talk to math ppl about what work I think we need to do to solve this though, āimpossibleā or āhundreds of years of workā seem to be the vibe. I think math is a cool field because more than other fields, it feels like work from hundreds of years ago is still very relevant. Problems are hard and progress is slow in a way that I donāt know if people involved in other things really āgetā. I feel like in math crowds Iām saying āno, donāt give up, maybe with a hundred years we can do it!ā And in other crowds Iām like ācāmon guys, could we have at least 10 years, maybe?ā Anyway, Iām rambling a bit, but the point is that my vibe is very much, āif the Russians defect, everyone diesā. āIf the North Koreans defect, everyone diesā. āIf Americans canāt bring themselves to trust other countries and donāt even try themselves, everyone diesā. So Iām currently feeling very āeveryone slightly sane should commit and signal commitment as hard as they canā cause I know it will be hard to get humanity on the same page about something. Basically impossible, never been done before. But so is ASI alignment.
I havenāt read those links. Iāll check em out, thanks : ) Iāve read a few things by Drexler about, like, automated plan generation and then humans audit and enact the plan. It makes me feel better about the situation. I think we could go farther safer with careful techniques like that, but that is both empowering us and bringing us closer to danger, and I donāt think it scales to SI, and unless we are really serious about using it to map RSI boundaries, it doesnāt even prevent misaligned decision systems from going RSI and killing us.
Yes, the math crowd is saying something like āgive us a hundred years and we can do it!ā. And nobody is going to give them that in the world we live in.
Fortunately, math isnāt the best tool to solve alignment. Foundation models are already trained to follow instructions given in natural language. If we make sure this is the dominant factor in foundation model agents, and use it carefully (donāt say dumb things like āāgo solve cancer, donāt bug me with the hows and whys, just git er done as you see fitā, etc), this could work.
We can probably achieve technical intent alignment if weāre even modestly careful and pay a modest alignment tax. Youāve now read my other posts making those arguments.
Unfortunately, itās not even clear the relevant actors are willing to be reasonably cautious or pay a modest alignment tax.
The other threads are addressed in responses to your comments on my linked posts.
Yes, youāve written more extensively on this than I realized, thanks for pointing out other relevant posts, sorry for not having taken the time to find them myself, Iām trying to err more on the side of communication than I have in the past.
I think math is the best tool to solve alignment. It might be emotional, Iāve been manipulated and hurt by natural language and the people who prefer it to math and have always found engaging with math to be soothing or at least sobering. It could also be that I truly believe that the engineering rigor that comes with understanding something enough to do math to it is extremely worthwhile for building a thing of the importance we are discussing.
Part of me wants to die on this hill and tell everyone who will listen āI know its impossible but we need to find ways to make it possible to give the math people the hundred years they need because if we donāt then everyone dies so theres no point in aiming for anything less and its unfortunate because it means itās likely we are doomed but thatās the truth as I see it.ā I just wonder how much of that part of me is my oppositional defiance disorder and how much is my strategizing for best outcome.
Iāll be reading your other posts. Thanks for engaging with me : )
I certainly donāt expect people to read a bunch of stuff before engaging! Iām really pleased that youāve read so much of my stuff. Iāll get back to these conversations soon hopefully, Iāve had to focus on new posts.
I think your feelings about math are shared by a lot of the alignment community. I like the way youāve expressed those intuitions.
I think math might be the best tool to solve alignment if we had unlimited timeābut it looks like we very much do not.
āweāve done so little work on alignment that I think it might actually be more like additive, from 1% to 26% or 50% to 75% with ten extra years relative to the real current odds if we press aheadāwhich nobody knows.ā šš¤£ I really want āWeāve done so little work the probabilities are additiveā to be a meme. I feel like I do get where youāre coming from.
I agree about pause concern. I also really feel that any delay to friendly SI represents an enormous amount of suffering that could be prevented if we got to friendly SI sooner. It should not be taken lightly. And being realistic about how difficult it is to align humans seems worthwhile. When I talk to math ppl about what work I think we need to do to solve this though, āimpossibleā or āhundreds of years of workā seem to be the vibe. I think math is a cool field because more than other fields, it feels like work from hundreds of years ago is still very relevant. Problems are hard and progress is slow in a way that I donāt know if people involved in other things really āgetā. I feel like in math crowds Iām saying āno, donāt give up, maybe with a hundred years we can do it!ā And in other crowds Iām like ācāmon guys, could we have at least 10 years, maybe?ā Anyway, Iām rambling a bit, but the point is that my vibe is very much, āif the Russians defect, everyone diesā. āIf the North Koreans defect, everyone diesā. āIf Americans canāt bring themselves to trust other countries and donāt even try themselves, everyone diesā. So Iām currently feeling very āeveryone slightly sane should commit and signal commitment as hard as they canā cause I know it will be hard to get humanity on the same page about something. Basically impossible, never been done before. But so is ASI alignment.
I havenāt read those links. Iāll check em out, thanks : ) Iāve read a few things by Drexler about, like, automated plan generation and then humans audit and enact the plan. It makes me feel better about the situation. I think we could go farther safer with careful techniques like that, but that is both empowering us and bringing us closer to danger, and I donāt think it scales to SI, and unless we are really serious about using it to map RSI boundaries, it doesnāt even prevent misaligned decision systems from going RSI and killing us.
Yes, the math crowd is saying something like āgive us a hundred years and we can do it!ā. And nobody is going to give them that in the world we live in.
Fortunately, math isnāt the best tool to solve alignment. Foundation models are already trained to follow instructions given in natural language. If we make sure this is the dominant factor in foundation model agents, and use it carefully (donāt say dumb things like āāgo solve cancer, donāt bug me with the hows and whys, just git er done as you see fitā, etc), this could work.
We can probably achieve technical intent alignment if weāre even modestly careful and pay a modest alignment tax. Youāve now read my other posts making those arguments.
Unfortunately, itās not even clear the relevant actors are willing to be reasonably cautious or pay a modest alignment tax.
The other threads are addressed in responses to your comments on my linked posts.
Yes, youāve written more extensively on this than I realized, thanks for pointing out other relevant posts, sorry for not having taken the time to find them myself, Iām trying to err more on the side of communication than I have in the past.
I think math is the best tool to solve alignment. It might be emotional, Iāve been manipulated and hurt by natural language and the people who prefer it to math and have always found engaging with math to be soothing or at least sobering. It could also be that I truly believe that the engineering rigor that comes with understanding something enough to do math to it is extremely worthwhile for building a thing of the importance we are discussing.
Part of me wants to die on this hill and tell everyone who will listen āI know its impossible but we need to find ways to make it possible to give the math people the hundred years they need because if we donāt then everyone dies so theres no point in aiming for anything less and its unfortunate because it means itās likely we are doomed but thatās the truth as I see it.ā I just wonder how much of that part of me is my oppositional defiance disorder and how much is my strategizing for best outcome.
Iāll be reading your other posts. Thanks for engaging with me : )
I certainly donāt expect people to read a bunch of stuff before engaging! Iām really pleased that youāve read so much of my stuff. Iāll get back to these conversations soon hopefully, Iāve had to focus on new posts.
I think your feelings about math are shared by a lot of the alignment community. I like the way youāve expressed those intuitions.
I think math might be the best tool to solve alignment if we had unlimited timeābut it looks like we very much do not.