OP here, talking from an older account because it was easier to log into on mobile.
Kill: I never said anything about killing them. Prisoners like this don’t pose any immediate threat to anyone, and indeed are probably very skilled white collar workers who could earn a lot of money even behind bars. No reason you couldn’t just throw them into a minimum security jail in Sweden or something and keep an eye on their Internet activity.
McCarthyism: Communism didn’t take over in the US. That provides if anything weak evidence that these kinds of policies can work, even for suppressing much more controversial ideas than preventing the building of an unsafe AI.
q1: The hardcore answer would be “Sorry kid, nothing personal.” If there was ever a domain where false positives were acceptable losses, stopping unaligned AI from being created in the first place would probably be it. People have waged wars for far less. The softcore answer, and the one I actually believe, is that you’re probably a smart enough guy that if such a bounty were announced you would be able to drop those activities quickly and find new work or hobbies within a few months.
q2: I mean, you could. You can make a bounty to disincentivize any behavior. But who would have that kind of goal or support such a bounty, much less fund one? If you’re worried about Goodhart’s law here, just use a coarse enough metric like “gets paid to work on something AI-related” and accept there would be some false positives.
you would be able to drop those activities quickly and find new work or hobbies within a few months.
I don’t see it. Literally how would I defend myself? Someone who doesn’t like me tells you that I’m doing AI research. What questions do you ask them before investigating me? What questions do you ask me? Are there any answers I can give that meaningfully prove that I never did any such research (without you ransacking my house and destroying my computers?)
re q2: If you set up the bounty, then other people can use it to target whoever they want. Other people might have plenty of reasons to target alignment-oriented researchers. Alignment-oriented researchers are a more extreme / weird group of people than AI researchers at large, so I expect there to be more optimization pressure per target trying to target them. (jail / neutralize / kill / whatever you want to call it)
If you’re worried about Goodhart’s law, just use a coarse enough metric...
I don’t think Goodhart is to blame here, per se. You are giving out a tool that preferentially favors offense to defense (something of an asymmetric weapon). Making the criteria coarser gives more power to those who want to abuse it, not less.
I really don’t empathize with an intuition that this would be effective at causing differential progress of alignment over capability. Much like McCarthyism, the first order effect is terrorism, (especially in adjacent communities but also everywhere) and the intended impact is a hard-to-measure second order effect. (Remember, you need to slow down AI progress more than you slow down AI alignment progress, and that is hard to measure.) Eliezer recently pointed out that in the reference class of “do something crazy and immoral because it might have good second-order effects” tends to underperform pretty badly on those second-order effects.
OP here, talking from an older account because it was easier to log into on mobile.
Kill: I never said anything about killing them. Prisoners like this don’t pose any immediate threat to anyone, and indeed are probably very skilled white collar workers who could earn a lot of money even behind bars. No reason you couldn’t just throw them into a minimum security jail in Sweden or something and keep an eye on their Internet activity.
McCarthyism: Communism didn’t take over in the US. That provides if anything weak evidence that these kinds of policies can work, even for suppressing much more controversial ideas than preventing the building of an unsafe AI.
q1: The hardcore answer would be “Sorry kid, nothing personal.” If there was ever a domain where false positives were acceptable losses, stopping unaligned AI from being created in the first place would probably be it. People have waged wars for far less. The softcore answer, and the one I actually believe, is that you’re probably a smart enough guy that if such a bounty were announced you would be able to drop those activities quickly and find new work or hobbies within a few months.
q2: I mean, you could. You can make a bounty to disincentivize any behavior. But who would have that kind of goal or support such a bounty, much less fund one? If you’re worried about Goodhart’s law here, just use a coarse enough metric like “gets paid to work on something AI-related” and accept there would be some false positives.
I don’t see it. Literally how would I defend myself? Someone who doesn’t like me tells you that I’m doing AI research. What questions do you ask them before investigating me? What questions do you ask me? Are there any answers I can give that meaningfully prove that I never did any such research (without you ransacking my house and destroying my computers?)
re q2: If you set up the bounty, then other people can use it to target whoever they want. Other people might have plenty of reasons to target alignment-oriented researchers. Alignment-oriented researchers are a more extreme / weird group of people than AI researchers at large, so I expect there to be more optimization pressure per target trying to target them. (jail / neutralize / kill / whatever you want to call it)
I don’t think Goodhart is to blame here, per se. You are giving out a tool that preferentially favors offense to defense (something of an asymmetric weapon). Making the criteria coarser gives more power to those who want to abuse it, not less.
I really don’t empathize with an intuition that this would be effective at causing differential progress of alignment over capability. Much like McCarthyism, the first order effect is terrorism, (especially in adjacent communities but also everywhere) and the intended impact is a hard-to-measure second order effect. (Remember, you need to slow down AI progress more than you slow down AI alignment progress, and that is hard to measure.) Eliezer recently pointed out that in the reference class of “do something crazy and immoral because it might have good second-order effects” tends to underperform pretty badly on those second-order effects.