In those terms the point is that paying $250k+ does not align them—it simply hides the problem of their misalignment.
[work on alignment] does not necessarily benefit humanity, even in expectation. That requires [the right kind of work on alignment] - and we don’t have good tests for that. Aiming for the right kind of work is no guarantee you’re doing it—but it beats not aiming for it. (again, argument is admittedly less clear for engineers)
Paying $100k doesn’t solve this alignment problem—it just allows us to see it. We want defection to be obvious. [here I emphasize that I don’t have any problem with people who would work for $100k happening to get $250k, if that seems efficient]
Worth noting that we don’t require a “benefit humanity” motivation—intellectual curiosity will do fine (and I imagine this is already a major motivation for most researchers: how many would be working on the problem if it were painfully dull?). We only require that they’re actually solving the problem. If we knew how to get them to do that for other reasons that’d be fine—but I don’t think money or status are good levers here. (or at the very least, they’re levers with large downsides)
In those terms the point is that paying $250k+ does not align them—it simply hides the problem of their misalignment.
[work on alignment] does not necessarily benefit humanity, even in expectation. That requires [the right kind of work on alignment] - and we don’t have good tests for that. Aiming for the right kind of work is no guarantee you’re doing it—but it beats not aiming for it. (again, argument is admittedly less clear for engineers)
Paying $100k doesn’t solve this alignment problem—it just allows us to see it. We want defection to be obvious. [here I emphasize that I don’t have any problem with people who would work for $100k happening to get $250k, if that seems efficient]
Worth noting that we don’t require a “benefit humanity” motivation—intellectual curiosity will do fine (and I imagine this is already a major motivation for most researchers: how many would be working on the problem if it were painfully dull?).
We only require that they’re actually solving the problem. If we knew how to get them to do that for other reasons that’d be fine—but I don’t think money or status are good levers here. (or at the very least, they’re levers with large downsides)