Joe Collman comments on AI Safety Impact Markets: Your Charity Evaluator for AI Safety

Joe Collman 2 Oct 2023 21:29 UTC
1 point
0
Thanks for the lengthy response.
Pre-emptive apologies for my too-lengthy response; I tried to condense it a little, but gave up!
Some thoughts:
First, since it may help suggest where I’m coming from:
we’re probably embedded in different circles
Certainly to some extent, but much less than you’re imagining—I’m an initially self-funded AIS researcher who got a couple of LTFF research grants and has since been working with MATS. Most of those coming through MATS have short runways and uncertain future support for their research (quite a few are on PhD programs, but rarely AIS PhDs).
Second, I get the impression that you think I’m saying [please don’t do this] rather than [please do this well]. My main point throughout is that the lack of reliable feedback makes things fundamentally different, and that we shouldn’t expect [great mechanism in a context with good feedback] to look the same as [great mechanism in a context without good feedback].
To be clear, when I say “lack of reliable feedback”, I mean relative to what would be necessary—not relative to the best anyone can currently do. Paul Christiano’s carefully analyzing each project proposal (or outcome) for two weeks wouldn’t be “reliable feedback” in the sense I mean.
I should clarify that I’m talking only about technical AIS research when it comes to inadequacy of feedback. For e.g. projects to increase the chance of an AI pause/moratorium, I’m much less pessimistic: I’d characterize these as [very messy, but within a context that’s fairly well understood]. I’d expect the right market mechanisms to do decently well at creating incentives here, and for our evaluations to be reasonably calibrated in their inaccuracy (or at least to correct in that direction over time).
Third, my concerns become much more significant as things scale—but such scenarios are where you’ll get almost all of your expected impact (whether positive or negative). As long as things stay small, you’re only risking missing the opportunity to do better, rather than e.g. substantially reducing the odds that superior systems are produced.
I’d be surprised if this kind of system is not net positive while small.
In much of the below, I’m assuming that things have scaled up quite a bit (or it’s likely no big deal).
On the longer-term, speculative aims:
My worry isn’t around obscure actions/outcomes that slip through the cracks. It’s around what I consider a central and necessary action: doing novel, non-obvious research, that gets us meaningfully closer to an alignment solution.
My claim is that we have no reliable way to identify or measure this. Further, the impact of most other plausibly useful outcomes hinges on whether they help with this—i.e. whether they increase the chance that [thing we can’t reliably measure progress towards] is produced.
So, for example, a program that successfully onboards and accelerates many new AIS researchers in
I think it’s more reasonable in an AI safety context to talk about outcomes than about impact: we can measure many different types of outcome. The honest answer to whether something has had net positive impact is likely to remain “we have no idea” for a long time.
With good feedback it’s very reasonable to think that [well constructed market mechanism] will do a good job at solving some problem. Without good feedback, there’s no reason to expect this to work. There’s a reason to think it’ll appear to be working—but that’s because we’ll be measuring our guesses against our guesses, and finding that we tend to agree with ourselves.
The retrospective element helps a little with this—but we’re still comparing [our guess with very little understanding] against [our guess with very little understanding and a little more information].
There are areas where it helps a lot—but those are the easy-to-make-legible areas I don’t worry about (much!).
It seems very important to consider how such a system might update and self-correct.
On the more immediate stuff:
It’s basically on par with how evaluations are done already while making them more scalable.
This is a good reason only if you think [scaled existing approach] is a good approach. Evaluations that are [the best we can do according to our current understanding] should not be confused with evaluations that are fairly accurate in an objective sense.
The best evaluations for technical AIS work currently suck. I think it’s important to design a system with that in mind (and to progressively aim for them to suck less, of course)
The counterfactual to getting funded through a system like ours is usually dropping out of AI safety work, not doing something better within AI safety.
I think what’s important here is [system we’re considering] vs [counterfactual system (with similar cost etc)]. So the question isn’t whether someone getting funded through this system would drop out if there were no system—rather it’s whether there’s some other system that’s likely to get more people doing something more valuable within AI safety.
If we’re successful with our system, project developers will much sooner do small, cheap tweaks to make their projects more legible, not change them fundamentally.
First, I don’t think it’s reasonable to assume that there exist “small, cheap tweaks” to make the most important neglected projects legible. The projects I’d consider most important are hard to make legibly useful—this is tied in a fundamental way to their importance: they’re (attempting) a non-obvious leap beyond what’s currently understood.
Second, the best systems will change the incentive landscape so that the kind of projects pursued are fundamentally changed. Unless we think that all the most important directions are already being pursued, it’s hugely important to improve the process by which ideas get created and selected.
Another point that confuses me:
If there is too little funding for some kind of valuable work, and the standardization firms find out about it, they can design new standards for those niches.
I expect there is hugely valuable work no-one is doing, and we don’t know what it is (and it’s unlikely some meta-project changes this picture much). In such a context, we need systems that make it more likely such work happens even without any ability to identify it upfront, or quickly notice its importance once it’s completed.
I’m not hugely worried about there being inadequate funding for concrete things that are noticeably neglected.
I see for example that even now, post-FTX, people are still talking about a talent constraint (rather than funding constraint) in AI safety, which I don’t see at all.
I think this depends a lot on one’s model of AI safety progress.
For instance, if we expect that we’ll make good progress by taking a load of fairly smart ML people and throwing them at plausibly helpful AIS projects, we seem funding constrained.
If most progress depends on researchers on Paul/Eliezer’s level, then we seem talent constrained. (here I don’t mean researchers capable of iterating on directions Paul/Eliezer have discovered, but rather those who’re capable of coming up with promising new ones themselves)
Of course it’s not so simple—both since things aren’t one-dimensional, and because there’s a question as to how effectively funding can develop the kind of talent that’s necessary (this seems hard to me).
I also think it depends on one’s high-level view of our current situation—in particular of the things we don’t yet understand, and therefore cannot concretely see. I think there’s a natural tendency to focus on the things we can see—and there there’s a lot of opportunity to do incremental work (funding constraint!).
However, if we instead believe that there are necessary elements of a solution that require identification and deconfusion before we can even find the right language to formulate questions, it’s quite possible that everything we can see is insufficient (talent constraint!).
Generally, I expect that we’re both funding and talent constrained.
I expect that a significant improvement to the funding side of things could be very important.
- Dawn Drescher 4 Oct 2023 15:15 UTC
  1 point
  0
  Parent
  Oh, haha! I’ll try to be more concise!
  Possible crux: I think I put a stronger emphasis on attribution of impact in my previous comment than you do because to me that seems like both a bit of a problem and solveable in most cases. When it comes to impact measurement, I’m actually (I think) much more pessimistic than you seem to be. There’s a risk that EV is just completely undefined even in principle and even if that should turn out to be false or we can use something like stochastic dominance instead to make decisions, that still leaves us with a near-impossible probabilistic modeling task.
  If the second is the case, then we can probably improve the situation a bit with projects like the Squiggle ecosystem and prediction markets but it’ll take time (which we may not have) and will be a small improvement. (An approximate comparison might be that I think that we can still do somewhat better than GiveWell, especially by not bottoming out at bad proxies like DALYs or handling uncertainty more rigorously with Squiggle, and that we can go as well as that in more areas. But not much more, probably.)
  Conversely, even if we have roughly the same idea how much the passing of time helps in forecasting things, I’m more optimistic about it, relatively speaking.
  Might that be a possible crux? Otherwise I feel like we agree on most things, like desiderata, current bottlenecks, and such.
  It seems very important to consider how such a system might update and self-correct.
  Argh, yeah. We’re following the example of carbon credits in many respects, and there there are some completely unnecessary issues whose impact market equivalents we need to prevent. It’s too early to think about this now, but when the time comes, we should definitely talk to insiders of the space who have ideas in how it should be changed (but probably can’t anymore) to prevent the bad incentives that have probably caused that.
  Another theme in our conversation, I think, is figuring out exactly what or how much the final system should do. Of course there are tons of important problems that need to be solved urgently, but if one system tries to solve all of them, they sometimes trade off against each other. Especially for small startups it can be better to focus on one problem and solve it well rather than solve a whole host of problem a little bit each.
  I think at Impact Markets we have this intuition that experienced AI safety researchers are smarter than most other people when it comes to prioritizing AI safety work, so that we shouldn’t try to steer incentives in some direction or other and instead double down on getting them funded. That gets harder once we have problems with fraud and whatnot, but when it comes to our core values, I think we are closer to, “We think you’re probably doing a good job and we want to help you,” rather than “You’re a bunch of raw talent that wants to be herded and molded.” Such things as banning scammers is then an unfortunate deviation from our core mission that we have to accept. That could change – but that’s my current feeling on our positioning.
  In such a context, we need systems that make it more likely such work happens even without any ability to identify it upfront, or quickly notice its importance once it’s completed.
  Nothing revolutionary, but this could become a bit easier. When Michael Aird started posting on the EA Forum, I and others probably figured, “Huh, why didn’t I think of doing that?” And then, “Wow, this fellow is great at identifying important, neglected work they can just do!” With a liquid impact market, Michael’s work would receive its first investments at this stage, which would create additional credible visibility on the marketplaces, which could cascade into more and more investments. We’re replicating that system with our score at the moment. Michael could build legible track record more quickly through the reputational injections from others, and then he could use that to fundraise for stuff that no one understands, yet.
  I expect that a significant improvement to the funding side of things could be very important.
  Yeah, also how to even test what the talent constraint is when the funding constraint screens it off. When the funding was flowing better (because part of it was stolen from FTX customers…), has AI safety progress sped up? Do you or others have intuitions on that?