Joe Collman comments on Rant on Problem Factorization for Alignment

Joe Collman 8 Aug 2022 3:59 UTC
LW: 3 AF: 3
0
AF
I’d be interested in your thoughts on [Humans-in-a-science-lab consulting HCH], for questions where we expect that suitable empirical experiments could be run on a significant proportion of subquestions. It seems to me that lack of frequent empirical grounding is what makes HCH particularly vulnerable to memetic selection.
Would you still expect this to go badly wrong (assume you get to pick the humans)? If so, would you expect sufficiently large civilizations to be crippled through memetic selection by default? If [yes, no], what do you see as the important differences?
… and now that I’m thinking about it, there’s a notable gap in economic theory here: the economists are using agents with different goals to motivate price mechanisms...
I don’t think it’s a gap in economic theory in general: pretty sure I’ve heard the [price mechanisms as distributed computation] idea from various Austrian-school economists without reliance on agents with different goals—only on “What should x cost in context y?” being a question whose answer depends on the entire system.
- johnswentworth 8 Aug 2022 17:20 UTC
  LW: 3 AF: 3
  0
  AF Parent
  It seems to me that lack of frequent empirical grounding is what makes HCH particularly vulnerable to memetic selection.
  Would you still expect this to go badly wrong (assume you get to pick the humans)? If [yes, no], what do you see as the important differences?
  Ok, so, some background on my mental image. Before yesterday, I had never pictured HCH as a tree of John Wentworths (thank you Rohin for that). When I do picture John Wentworths, they mostly just… refuse to do the HCH thing. Like, they take one look at this setup and decide to (politely) mutiny or something. Maybe they’re willing to test it out, but they don’t expect it to work, and it’s likely that their output is something like the string “lol nope”. I think an entire society of John Wentworths would probably just not have bureaucracies at all; nobody would intentionally create them, and if they formed accidentally nobody would work for them or deal with them.
  Now, there’s a whole space of things-like-HCH, and some of them look less like a simulated infinite bureaucracy and more like a simulated society. (The OP mostly wasn’t talking about things on the simulated-society end of the spectrum, because there will be another post on that.) And I think a bunch of John Wentworths in something like a simulated society would be fine—they’d form lots of small teams working in-person, have forums like LW for reasonably-high-bandwidth interteam communication, and have bounties on problems and secondary markets on people trying to get the bounties and independent contractors and all that jazz.
  Anyway, back to your question. If those John Wentworths lacked the ability to run experiments, they would be relatively pessimistic about their own chances, and a huge portion of their work would be devoted to figuring out how to pump bits of information and stay grounded without a real-world experimental feedback channel. That’s not a deal-breaker; background knowledge of our world already provides far more bits of evidence than any experiment ever run, and we could still run experiments on the simulated-Johns. But I sure would be a lot more optimistic with an experimental channel.
  I do not think memetic selection in particular would cripple those Johns, because that’s exactly the sort of thing they’d be on the lookout for. But I’m not confident of that. And I’d be a lot more pessimistic about the vast majority of other people. (I do expect that most people think a bureaucracy/society of themselves would work better than the bureaucracies/societies we have, and I expect that at least a majority and probably a large majority are wrong about that, because bureaucracies are generally made of median-ish people. So I am very suspicious of my inner simulator saying “well, if it was a bunch of copies of John Wentworth, they would know to avoid the failure modes which mess up real-world bureaucracies/societies”. Most people probably think that, and most people are probably wrong about it.)
  I do think our current civilization is crippled by memetic selection to pretty significant extent. (I mean, that’s not the only way to frame it or the only piece, but it’s a correct frame for a large piece.)
  I don’t think it’s a gap in economic theory in general: pretty sure I’ve heard the [price mechanisms as distributed computation] idea from various Austrian-school economists without reliance on agents with different goals—only on “What should x cost in context y?” being a question whose answer depends on the entire system.
  Economists do talk about that sort of thing, but I don’t usually see it in their math. Of course we can get e.g. implied prices for any pareto-optimal system, but I don’t know of math saying that systems will end up using those implied prices internally.
  What links here?
  - Aligned AI via monitoring objectives in AutoGPT-like systems by Paul Colognese (24 May 2023 15:59 UTC; 27 points)
  - Joe Collman 8 Aug 2022 21:26 UTC
    LW: 3 AF: 3
    0
    AF Parent
    Interesting, thanks. This makes sense to me.
    I do think strong-HCH can support the ”...more like a simulated society...” stuff in some sense—which is to say that it can be supported so long as we can rely on individual Hs to robustly implement the necessary pointer passing (which, to be fair, we can’t).
    To add to your “tree of John Wentworths”, it’s worth noting that H doesn’t need to be an individual human—so we could have our H be e.g. {John Wentworth, Eliezer Yudkowsky, Paul Christiano, Wei Dai}, or whatever team would make you more optimistic about lack of memetic disaster. (we also wouldn’t need to use the same H at every level)
    - johnswentworth 8 Aug 2022 22:00 UTC
      LW: 2 AF: 2
      0
      AF Parent
      Yeah, at some point we’re basically simulating the alignment community (or possibly several copies thereof interacting with each other). There will probably be another post on that topic soonish.