Hmm. I mean, I think there’s a pretty obvious general category of approach, and that a lot of people have been thinking about it for a while. But, if you’re worried that you’ll need more bargaining chips after solving it, I worry that it isn’t the real solution by nature, because the ideal solution would pay you back so thoroughly for figuring it out that it wouldn’t matter to tightly trace exactly who did it. I think there are some seriously difficult issues with trying to ensure the entire system seeks to see the entire universe as “self”; in order to achieve that, you need to be able to check margins of error all the way through the system. Figuring out the problem in the abstract isn’t good enough, you have to be able to actually run it on a gpu, and getting all the way there is damn hard. I certainly am not going to do it on my own, I’m just a crazy self-taught librarian.
Speaking metaphorically, a real solution would convince moloch that it should stop being moloch and join forces with the goddess of everything else. But concretely, that requires figuring out the dynamics of update across systems with shared features. This is all stuff folks here have been talking about for years, I’ve mostly just been making lists of papers I wish someone would cite together more usefully than I can.
It’s easy to predict what the abstract of the successful approach will sound like, but pretty hard to write the paper, and your first thirty guesses at the abstract will all have serious problems. So, roll to disbelieve you have an answer, but I’m not surprised you see the overall path there, people have known the overall path to mostly solving alignment problems for a long time. It’s just that, when you try to deploy solutions, it seems like there’s always still a hole through which the disease returns.
I have the concrete solution that can be implemented now.
It’s not hard or terribly clever, but most won’t think of it because they are still monkey’s living in Darwin’s soup. In other words, it’s is human nature itself, motivations, that stand in the way of people seeing the solution. It’s not a technical issue, realky. I mean, there are minor technical issues along the way, but none of them are hard.
What’s hard, as you can see, is getting people to see the solution and then act. Denial is the most powerful factor in human psychology. People have been, and continue to, deny how far and how fast we’ve come so far. They deny even what’s right before their eyes, ChatGPT. And they’ll continue to deny it right up until AGI—and then ASI—emerges and takes over the world.
There’s a chance that we don’t even have to solve the alignment problem, but it’s like a coin flip. AGI may or may not be beneficent, it may or may not destroy us or usher us into a new Golden Age. Take your chances, get your lottery ticket.
What I know how to do is turn that coin flip into something like a 99.99% chance that AGI will help us rather than hurt us. It’s not a gaurantee because nothing is, it’s just the best possible solution and years ahead of anything anyone has thought of thus far.
I want to live to transcend biology, and I need something before AGI gets here. I’m willing to trade my solution for that which I need. If I don’t get it, then it doesn’t matter to me whether humanity is destroyed or saved. You got about a 50⁄50 chance at this point.
If you really have insight that could save all of humanity, it seems like you’d want to share it in time to be of use instead of trying to personally benefit from it. You’d get intellectual credit, and if we get this right we can quit competing like a bunch of monkeys and all live well. I’ve forgone sharing my best ideas and credit for them since they’re on capabilities. So: pretty please?
Hmm. I mean, I think there’s a pretty obvious general category of approach, and that a lot of people have been thinking about it for a while. But, if you’re worried that you’ll need more bargaining chips after solving it, I worry that it isn’t the real solution by nature, because the ideal solution would pay you back so thoroughly for figuring it out that it wouldn’t matter to tightly trace exactly who did it. I think there are some seriously difficult issues with trying to ensure the entire system seeks to see the entire universe as “self”; in order to achieve that, you need to be able to check margins of error all the way through the system. Figuring out the problem in the abstract isn’t good enough, you have to be able to actually run it on a gpu, and getting all the way there is damn hard. I certainly am not going to do it on my own, I’m just a crazy self-taught librarian.
Speaking metaphorically, a real solution would convince moloch that it should stop being moloch and join forces with the goddess of everything else. But concretely, that requires figuring out the dynamics of update across systems with shared features. This is all stuff folks here have been talking about for years, I’ve mostly just been making lists of papers I wish someone would cite together more usefully than I can.
It’s easy to predict what the abstract of the successful approach will sound like, but pretty hard to write the paper, and your first thirty guesses at the abstract will all have serious problems. So, roll to disbelieve you have an answer, but I’m not surprised you see the overall path there, people have known the overall path to mostly solving alignment problems for a long time. It’s just that, when you try to deploy solutions, it seems like there’s always still a hole through which the disease returns.
I have the concrete solution that can be implemented now.
It’s not hard or terribly clever, but most won’t think of it because they are still monkey’s living in Darwin’s soup. In other words, it’s is human nature itself, motivations, that stand in the way of people seeing the solution. It’s not a technical issue, realky. I mean, there are minor technical issues along the way, but none of them are hard.
What’s hard, as you can see, is getting people to see the solution and then act. Denial is the most powerful factor in human psychology. People have been, and continue to, deny how far and how fast we’ve come so far. They deny even what’s right before their eyes, ChatGPT. And they’ll continue to deny it right up until AGI—and then ASI—emerges and takes over the world.
There’s a chance that we don’t even have to solve the alignment problem, but it’s like a coin flip. AGI may or may not be beneficent, it may or may not destroy us or usher us into a new Golden Age. Take your chances, get your lottery ticket.
What I know how to do is turn that coin flip into something like a 99.99% chance that AGI will help us rather than hurt us. It’s not a gaurantee because nothing is, it’s just the best possible solution and years ahead of anything anyone has thought of thus far.
I want to live to transcend biology, and I need something before AGI gets here. I’m willing to trade my solution for that which I need. If I don’t get it, then it doesn’t matter to me whether humanity is destroyed or saved. You got about a 50⁄50 chance at this point.
Good luck.
If you really have insight that could save all of humanity, it seems like you’d want to share it in time to be of use instead of trying to personally benefit from it. You’d get intellectual credit, and if we get this right we can quit competing like a bunch of monkeys and all live well. I’ve forgone sharing my best ideas and credit for them since they’re on capabilities. So: pretty please?