Yes, the math crowd is saying something like “give us a hundred years and we can do it!”. And nobody is going to give them that in the world we live in.
Fortunately, math isn’t the best tool to solve alignment. Foundation models are already trained to follow instructions given in natural language. If we make sure this is the dominant factor in foundation model agents, and use it carefully (don’t say dumb things like “’go solve cancer, don’t bug me with the hows and whys, just git er done as you see fit”, etc), this could work.
We can probably achieve technical intent alignment if we’re even modestly careful and pay a modest alignment tax. You’ve now read my other posts making those arguments.
Unfortunately, it’s not even clear the relevant actors are willing to be reasonably cautious or pay a modest alignment tax.
The other threads are addressed in responses to your comments on my linked posts.
Yes, you’ve written more extensively on this than I realized, thanks for pointing out other relevant posts, sorry for not having taken the time to find them myself, I’m trying to err more on the side of communication than I have in the past.
I think math is the best tool to solve alignment. It might be emotional, I’ve been manipulated and hurt by natural language and the people who prefer it to math and have always found engaging with math to be soothing or at least sobering. It could also be that I truly believe that the engineering rigor that comes with understanding something enough to do math to it is extremely worthwhile for building a thing of the importance we are discussing.
Part of me wants to die on this hill and tell everyone who will listen “I know its impossible but we need to find ways to make it possible to give the math people the hundred years they need because if we don’t then everyone dies so theres no point in aiming for anything less and its unfortunate because it means it’s likely we are doomed but that’s the truth as I see it.” I just wonder how much of that part of me is my oppositional defiance disorder and how much is my strategizing for best outcome.
I’ll be reading your other posts. Thanks for engaging with me : )
I certainly don’t expect people to read a bunch of stuff before engaging! I’m really pleased that you’ve read so much of my stuff. I’ll get back to these conversations soon hopefully, I’ve had to focus on new posts.
I think your feelings about math are shared by a lot of the alignment community. I like the way you’ve expressed those intuitions.
I think math might be the best tool to solve alignment if we had unlimited time—but it looks like we very much do not.
Yes, the math crowd is saying something like “give us a hundred years and we can do it!”. And nobody is going to give them that in the world we live in.
Fortunately, math isn’t the best tool to solve alignment. Foundation models are already trained to follow instructions given in natural language. If we make sure this is the dominant factor in foundation model agents, and use it carefully (don’t say dumb things like “’go solve cancer, don’t bug me with the hows and whys, just git er done as you see fit”, etc), this could work.
We can probably achieve technical intent alignment if we’re even modestly careful and pay a modest alignment tax. You’ve now read my other posts making those arguments.
Unfortunately, it’s not even clear the relevant actors are willing to be reasonably cautious or pay a modest alignment tax.
The other threads are addressed in responses to your comments on my linked posts.
Yes, you’ve written more extensively on this than I realized, thanks for pointing out other relevant posts, sorry for not having taken the time to find them myself, I’m trying to err more on the side of communication than I have in the past.
I think math is the best tool to solve alignment. It might be emotional, I’ve been manipulated and hurt by natural language and the people who prefer it to math and have always found engaging with math to be soothing or at least sobering. It could also be that I truly believe that the engineering rigor that comes with understanding something enough to do math to it is extremely worthwhile for building a thing of the importance we are discussing.
Part of me wants to die on this hill and tell everyone who will listen “I know its impossible but we need to find ways to make it possible to give the math people the hundred years they need because if we don’t then everyone dies so theres no point in aiming for anything less and its unfortunate because it means it’s likely we are doomed but that’s the truth as I see it.” I just wonder how much of that part of me is my oppositional defiance disorder and how much is my strategizing for best outcome.
I’ll be reading your other posts. Thanks for engaging with me : )
I certainly don’t expect people to read a bunch of stuff before engaging! I’m really pleased that you’ve read so much of my stuff. I’ll get back to these conversations soon hopefully, I’ve had to focus on new posts.
I think your feelings about math are shared by a lot of the alignment community. I like the way you’ve expressed those intuitions.
I think math might be the best tool to solve alignment if we had unlimited time—but it looks like we very much do not.