Joe Collman comments on Why don’t you introduce really impressive people you personally know to AI alignment (more often)?

Joe Collman 12 Jun 2022 2:44 UTC
2 points
I sympathise with this view, but I think there’s a huge issue with treating “AI safety research” as a kind of fixed value here. We need more of the right kinds of AI safety research. Not everything labelled “AIS research” qualifies.
Caveats—the below applies:
- To roles to the extent that they’re influencing research directions. (so less for engineers than scientists—but still somewhat)
- To people who broadly understand the AIS problem.
“AIS research” will look very different depending on whether motivations are primarily intrinsic or extrinsic. The danger with extrinsic motivation is that you increase the odds that the people so motivated will pursue [look like someone doing important work] rather than [do important work]. This is a much bigger problem in areas where we don’t have ground truth feedback, and our estimates are poor, e.g. AI safety.
The caricature is that by incentivizing monetarily we get a load of people who have “AI Safety Research” written on their doors, can write convincing grant proposals and publish papers, but are doing approximately nothing to solve important problems.
It’s not clear to me how far the reality departs from such a caricature.
It’s also unclear to me whether this kind of argument fails for engineers: heading twice as fast in even slightly the wrong direction may be a poor trade.
I think it’s very important to distinguish financial support from financial incentive here.
By all means pay people in the bay area $100k or $125k
If someone needs >$250k before they’ll turn up (without a tremendously good reason), then I don’t want them in a position to significantly influence research directions.
This is less clear when people filter out non-competitive salary areas before learning about them. However, I’d want to solve this problem in a different way than [pay high salaries so that people learn about the area].
- JBlack 12 Jun 2022 6:37 UTC
  11 points
  Parent
  [Edit: I initially thought of this purely tongue in cheek, but maybe there is something here that is worth examining further?]
  You have cognitively powerful agents (highly competent researchers) who have incentives (250k+ salaries) to do things that you don’t want them to do (create AGIs that are likely unaligned), and you want them to instead do things that benefit humanity (work on alignment) instead.
  It seems to me that offering $100k salaries to work for you instead is not an effective solution to this alignment problem. It relies on the agents being already aligned to the extent that a $150k/yr loss is outweighed by other incentives.
  If money were not a tight constraint, it seems to me that offering $250k/yr would be worthwhile even if for no other reason than having them not work on racing to AGI.
  - Joe Collman 12 Jun 2022 7:31 UTC
    1 point
    Parent
    The “pay to not race to AGI” would only make sense if there were a smallish pool of replacements ready to step in and work on racing to AGI. This doesn’t seem to be the case. The difference it’d make might not be zero, but close enough to be obviously inefficient.
    In particular, there are presumably effective ways to use money to create a greater number of aligned AIS researchers—it’s just that [give people a lot of money to work on it] probably isn’t one of them.
  - Joe Collman 12 Jun 2022 7:20 UTC
    0 points
    Parent
    In those terms the point is that paying $250k+ does not align them—it simply hides the problem of their misalignment.
    [work on alignment] does not necessarily benefit humanity, even in expectation. That requires [the right kind of work on alignment] - and we don’t have good tests for that. Aiming for the right kind of work is no guarantee you’re doing it—but it beats not aiming for it. (again, argument is admittedly less clear for engineers)
    Paying $100k doesn’t solve this alignment problem—it just allows us to see it. We want defection to be obvious. [here I emphasize that I don’t have any problem with people who would work for $100k happening to get $250k, if that seems efficient]
    Worth noting that we don’t require a “benefit humanity” motivation—intellectual curiosity will do fine (and I imagine this is already a major motivation for most researchers: how many would be working on the problem if it were painfully dull?).
    We only require that they’re actually solving the problem. If we knew how to get them to do that for other reasons that’d be fine—but I don’t think money or status are good levers here. (or at the very least, they’re levers with large downsides)