Rohin Shah comments on Late 2021 MIRI Conversations: AMA / Discussion

Rohin Shah 6 Mar 2022 13:57 UTC
LW: 2 AF: 2
AF
It’s the first guess.
I think if you have a particular number then I’m like “yup, it’s fair to notice that we overestimate the probability that x is even and odd by saying it’s 25%”, and then I’d say “notice that we underestimate the probability that x is even and divisible by 4 by saying it’s 12.5%”.
I agree that if you estimate a probability, and then “perform search” / “optimize” / “run n copies of the estimate” (so that you estimate the probability as 1 - (1 - P(event))^n), then you’re going to have systematic errors.
I don’t think I’m doing anything that’s analogous to that. I definitely don’t go around thinking “well, it seems 10% likely that such and such feature of the world holds, and so each alignment scheme I think of that depends on this feature has a 10% chance of working, therefore if I think of 10 alignment schemes I’ve solved the problem”. (I suspect this is not the sort of mistake you imagine me doing but I don’t think I know what you do imagine me doing.)
- Vaniver 6 Mar 2022 15:39 UTC
  LW: 4 AF: 3
  AF Parent
  I’d say “notice that we underestimate the probability that x is even and divisible by 4 by saying it’s 12.5%”.
  Cool, I like this example.
  I agree that if you estimate a probability, and then “perform search” / “optimize” / “run n copies of the estimate” (so that you estimate the probability as 1 - (1 - P(event))^n), then you’re going to have systematic errors.
  ...
  I suspect this is not the sort of mistake you imagine me doing but I don’t think I know what you do imagine me doing.
  I think the thing I’m interested in is “what are our estimates of the output of search processes?”. The question we’re ultimately trying to answer with a model here is something like “are humans, when they consider a problem that could have attempted solutions of many different forms, overly optimistic about how solvable those problems are because they hypothesize a solution with inconsistent features?”
  The example of “a number divisible by 2 and a number divisible by 4” is an example of where the consistency of your solution helps you—anything that satisfies the second condition is already satisfying the first condition. But importantly the best you can do here is ignore superfluous conditions; they can’t increase the volume of the solution space. I think this is where the systematic bias is coming from (that the joint probability of two conditions can’t be higher than the maximum of those two conditions, where the joint probability can be lower than the minimum of the two, and so the product isn’t an unbiased estimator of the joint).
  For example, consider this recent analysis of cultured meat, which seems to me to point out a fundamental inconsistency of this type in people’s plans for creating cultured meat. Basically, the bigger you make a bioreactor, the better it looks on criteria ABC, and the smaller you make a bioreactor, the better it looks on criteria DEF, and projections seem to suggest that massive progress will be made on all of those criteria simultaneously because progress can be made on them individually. But this necessitates making bioreactors that are simultaneously much bigger and much smaller!
  [Sometimes this is possible, because actually one is based on volume and the other is based on surface area, and so when you make something like a zeolite you can combine massive surface area with tiny volume. But if you need massive volume and tiny surface area, that’s not possible. Anyway, in this case, my read is that both of these are based off of volume, and so there’s no clever technique like that available.]
  Maybe you could step me thru how your procedure works for estimating the viability of cultured meat, or the possibility of constructing a room temperature <10 atm superconductor, or something?
  It seems to me like there’s a version of your procedure which, like, considers all of the different possible factory designs, applies some functions to determine the high-level features of those designs (like profitability, amount of platinum they consume, etc.), and then when we want to know “is there a profitable cultured meat factory?” responds with “conditioning on profitability > 0, this is the set of possible designs.” And then when I ask “is there a profitable cultured meat factory using less than 1% of the platinum available on Earth?” says “sorry, that query is too difficult; I calculated the set of possible designs conditioned on profitability, calculated the set of possible designs conditioned on using less than 1% of the platinum available on Earth, and then <multiplied sets together> to give you this approximate answer.”
  But of course that’s not what you’re doing, because the boundedness prevents you from considering all the different possible factory designs. So instead you have, like, clusters of factory designs in your map? But what are those objects, and how do they work, and why don’t they have the problem of not noticing inconsistencies because they didn’t fully populate the details? [Or if they did fully populate the details for some limited number of considered objects, how do you back out the implied probability distribution over the non-considered objects in a way that isn’t subject to this?]
  - Rohin Shah 7 Mar 2022 13:38 UTC
    LW: 5 AF: 4
    AF Parent
    Re: cultured meat example: If you give me examples in which you know the features are actually inconsistent, my method is going to look optimistic when it doesn’t know about that inconsistency. So yeah, assuming your description of the cultured meat example is correct, my toy model would reproduce that problem.
    To give a different example, consider OpenAI Five. One would think that to beat Dota, you need to have an algorithm that allows you to do hierarchical planning, state estimation from partial observability, coordination with team members, understanding of causality, compression of the giant action space, etc. Everyone looked at this giant list of necessary features and thought “it’s highly improbable for an algorithm to demonstrate all of these features”. My understanding is that even OpenAI, the most optimistic of everyone, thought they would need to do some sort of hierarchical RL to get this to work. In the end, it turned out that vanilla PPO with reward shaping and domain randomization was enough. It turns out that all of these many different capabilities / features were very consistent with each other and easier to achieve simultaneously than we thought.
    so the product isn’t an unbiased estimator of the joint
    Tbc, I don’t want to claim “unbiased estimator” in the mathematical sense of the phrase. To even make such a claim you need to choose some underlying probability distribution which gives rise to our features, which we don’t have. I’m more saying that the direction of the bias depends on whether your features are positively vs. negatively correlated with each other and so a priori I don’t expect the bias to be in a predictable direction.
    But what are those objects, and how do they work, and why don’t they have the problem of not noticing inconsistencies because they didn’t fully populate the details?
    They definitely have that problem. I’m not sure how you don’t have that problem; you’re always going to have some amount of abstraction and some amount of inconsistency; the future is hard to predict for bounded humans, and you can’t “fully populate the details” as an embedded agent.
    If you’re asking how you notice any inconsistencies at all (rather than all of the inconsistences), then my answer is that you do in fact try to populate details sometimes, and that can demonstrate inconsistencies (and consistencies).
    I can sketch out a more concrete, imagined-in-hindsight-and-therefore-false story of what’s happening.
    Most of the “objects” are questions about the future to which there are multiple possible answers, which you have a probability distribution over (you can think of this as a factor in a Finite Factored Set, with an associated probability distribution over the answers). For example, you could imagine a question for “number of AGI orgs with a shot at time X”, “fraction of people who agree alignment is a problem”, “amount of optimization pressure needed to avoid deception”, etc. If you provide answers to some subset of questions, that gives you an incomplete possible world (which you could imagine as an implicitly-represented set of possible worlds if you want). Given an incomplete possible world, to answer a new question quickly you reason abstractly from the answers you are conditioning on to get an answer to the new question.
    When you have lots of time, you can improve your reasoning in many different ways:
    You can find other factors that seem important, add them in, subdividing worlds out even further.
    You can take two factors, and think about how compatible they are with each other, building intuitions about their joint (rather than just their marginal probabilities, which is what you have by default).
    You can take some incomplete possible world, sketch out lots of additional concrete details, and see if you can spot inconsistencies.
    You can refactor your “main factors” to be more independent of each other. For example, maybe you notice that all of your reasoning about things like “<metric> at time X” depends a lot on timelines, and so you instead replace them with factors like “<metric> at X years before crunch time”, where they are more independent of timelines.
    What links here?
    Sammy Martin's comment on AI Governance across Slow/Fast Takeoff and Easy/Hard Alignment spectra by Davidmanheim (4 Apr 2022 11:15 UTC; 14 points)

Rohin Shah comments on Late 2021 MIRI Conversations: AMA /​ Discussion

Rohin Shah comments on Late 2021 MIRI Conversations: AMA / Discussion