People who have the ability to clarify in any meaningful way will not do so. You are in a biased environment where people who are most willing to publish, because they are most able to convince themselves their research is safe—eg, because they don’t understand in detail how to reason about whether it is or not—are the ones who will do so. Ability to see far enough ahead would of course be expected to be rather rare, and most people who think they can tell the exact path ahead of time don’t have the evidence to back their hunches, even if their hunches are correct, which unless they have a demonstrated track record they probably aren’t. Therefore, whoever is making the most progress on real capabilities insights under the name of alignment will make their advancements and publish them, since they don’t personally see how it’s exfohaz. And it won’t be apparent until afterwards that it was capabilities, not alignment.
So just don’t publish anything, and do your work in private. Email it to anthropic when you know how to create a yellow node. But for god’s sake stop accidentally helping people create green nodes because you can’t see five inches ahead. And don’t send it to a capabilities team before it’s able to guarantee moral alignment hard enough to make a red-proof yellow node!
This seems contrary to how much of science works. I expect if people stopped talking publicly about what they’re working on in alignment, we’d make much less progress, and capabilities would basically run business as usual.
The sort of reasoning you use here, and that my only response to it basically amounts to “well, no I think you’re wrong. This proposal will slow down alignment too much” is why I think we need numbers to ground us.
People who have the ability to clarify in any meaningful way will not do so. You are in a biased environment where people who are most willing to publish, because they are most able to convince themselves their research is safe—eg, because they don’t understand in detail how to reason about whether it is or not—are the ones who will do so. Ability to see far enough ahead would of course be expected to be rather rare, and most people who think they can tell the exact path ahead of time don’t have the evidence to back their hunches, even if their hunches are correct, which unless they have a demonstrated track record they probably aren’t. Therefore, whoever is making the most progress on real capabilities insights under the name of alignment will make their advancements and publish them, since they don’t personally see how it’s exfohaz. And it won’t be apparent until afterwards that it was capabilities, not alignment.
So just don’t publish anything, and do your work in private. Email it to anthropic when you know how to create a yellow node. But for god’s sake stop accidentally helping people create green nodes because you can’t see five inches ahead. And don’t send it to a capabilities team before it’s able to guarantee moral alignment hard enough to make a red-proof yellow node!
This seems contrary to how much of science works. I expect if people stopped talking publicly about what they’re working on in alignment, we’d make much less progress, and capabilities would basically run business as usual.
The sort of reasoning you use here, and that my only response to it basically amounts to “well, no I think you’re wrong. This proposal will slow down alignment too much” is why I think we need numbers to ground us.