Scott Garrabrant comments on Optimization Amplifies

Scott Garrabrant 27 Jun 2018 2:32 UTC
LW: 19 AF: 5
AF
I think this is similar to Security Mindset, so you might want to think about this post in relation to that.
- Jan_Kulveit 27 Jun 2018 8:26 UTC
  LW: 1 AF: 4
  AF Parent
  I don’t think so, or if it is, than to a version of “security mindset” by Eliezer Yudkowsky, not a version by Bruce Schneier.
  Very roughly speaking, security mindset is about differences between probabilities 99,99% and 1-10^(-16). From a mathematical perspective the difference between 1-10^(-16) and 1 is still more similar to the difference between 1-10^(-4) and 1.
  Notable feature anybody who seriously studies security learns quickly is, it is in practice impossible to proof the security of anything useful except OTP. The whole rest of security usually reduces to physics and economy.
  Note: I am not saying that we don’t need mathematicians. We absolutely should try to get to that level of precision.
  At the same time, mathematical way of thinking is in some sense fragile: a proof is ether correct or not. A proof which is “almost correct” is not worth very much.
  - Rob Bensinger 27 Jun 2018 17:50 UTC
    LW: 21 AF: 2
    AF Parent
    When Scott says “mathematician mindset can be useful for AI alignment”, I take it that your interpretation is “we should try to make sure that when we build AGI, we can prove that our system is safe/robust/secure”, whereas I think the intended interpretation is “we should try to make sure that when we build AGI, we have a deep formal understanding of how this kind of system works at all so that we’re not flying blind”. Similar to how we understand the mathematics of how rockets work in principle, and if we found a way to build a rocket without that understanding, it’s very unlikely we’d be able to achieve much confidence in the system’s behavior.
    I think the end of this excerpt from a 2000 Bruce Schneier piece is assuming something like this, though I don’t know that Schneier would agree with Eliezer and Scott fully:
    Complexity is the worst enemy of security. [...]
    The first reason is the number of security bugs. All software contains bugs. And as the complexity of the software goes up, the number of bugs goes up. And a percentage of these bugs will affect security.
    The second reason is the modularity of complex systems. [...I]ncreased modularity means increased security flaws, because security often fails where two modules interact. [...]
    The third reason is the increased testing requirements for complex systems. [...]
    The fourth reason is that the more complex a system is, the harder it is to understand. There are all sorts of vulnerability points — human-computer interface, system interactions — that become much larger when you can’t keep the entire system in your head.
    The fifth (and final) reason is the difficulty of analysis. The more complex a system is, the harder it is to do this kind of analysis. Everything is more complicated: the specification, the design, the implementation, the use. And as we’ve seen again and again, everything is relevant to security analysis.
    Cf. this thing I said a few months ago:
    “Adding conceptual clarity” is a key motivation, but formal verification isn’t a key motivation.
    The point of things like logical induction isn’t “we can use the logical induction criterion to verify that the system isn’t making reasoning errors”; as I understand it, it’s more “logical induction helps move us toward a better understanding of what good reasoning is, with a goal of ensuring developers aren’t flying blind when they’re actually building good reasoners”.
    Daniel Dewey’s summary of the motivation behind HRAD is:
    “2) If we fundamentally ‘don’t know what we’re doing’ because we don’t have a satisfying description of how an AI system should reason and make decisions, then we will probably make lots of mistakes in the design of an advanced AI system.
    “3) Even minor mistakes in an advanced AI system’s design are likely to cause catastrophic misalignment.”
    To which Nate replied at the time:
    “I think this is a decent summary of why we prioritize HRAD research. I would rephrase 3 as ‘There are many intuitively small mistakes one can make early in the design process that cause resultant systems to be extremely difficult to align with operators’ intentions.’ I’d compare these mistakes to the ‘small’ decision in the early 1970s to use null-terminated instead of length-prefixed strings in the C programming language, which continues to be a major source of software vulnerabilities decades later.
    “I’d also clarify that I expect any large software product to exhibit plenty of actually-trivial flaws, and that I don’t expect that AGI code needs to be literally bug-free or literally proven-safe in order to be worth running.”
    The position of the AI community is something like the position researchers would be in if they wanted to build a space rocket, but hadn’t developed calculus or orbital mechanics yet. Maybe with enough trial and error (and explosives) you’ll eventually be able to get a payload off the planet that way, but if you want things to actually work correctly on the first go, you’ll need to do some basic research to cover core gaps in what you know.
    To say that calculus or orbital mechanics help you “formally verify” that the system’s parts are going to work correctly is missing where the main benefit lies, which is in knowing what you’re doing at all, not in being able to machine-verify everything you’d like to.
    Scott can correct me if I’m misunderstanding his post (e.g., rounding it off too much to what’s already in my head).
    - Jan_Kulveit 28 Jun 2018 0:39 UTC
      LW: 12 AF: 5
      AF Parent
      I think “what should be done” is generally different question that “what kind of mindsets there are” and I would prefer to disentangle them.
      My claims about mindsets roughly are
      there is important and meaningful distinction between “security mindset” and “mathematical mindset” (as is between 1-10^(-16) and 1)
      also between “mathematical mindset” and e.g. “physics mindset”
      the security mindset may be actually closer to some sort of scientific mindset
      the way of reasoning common in maths is fragile in some sense
      As I understand it (correct me if I’m wrong), your main claim roughly is “we should have a deep understanding how these systems works at all”.
      I don’t think there is much disagreement on that.
      But please note that Scott’s post in several places makes explicit distinction between the kind of understanding achieved in mathematics, and in science. The understanding we have how rockets work is pretty much on the physics side of this—e.g. we know we can disregard gravitational waves, radiation pressure, and violations of CP symmetry.
      To me, this seems different from mathematics, where it would be somewhat strange to say something like “we basically understand what functions and derivatives are … you can just disregard cases like the Weierstrass function”.
      (comment to mods: I would actually enjoy a setting allowing me to not see the karma system at all, the feedback it is giving me is “write things which people would upvote” vs. “write things which are most useful—were I’m unsure, see some flaws,...”. )
      - nostalgebraist 19 Aug 2018 20:18 UTC
        8 points
        0
        Parent
        I agree. When I think about the “mathematician mindset” I think largely about the overwhelming interest in the presence or absence, in some space of interest, of “pathological” entities like the Weierstrass function. The truth or falsehood of “for all / there exists” statements tend to turn on these pathologies or their absence.
        How does this relate to optimization? Optimization can make pathological entities more relevant, if
        (1) they happen to be optimal solutions, or
        (2) an algorithm that ignores them will be, for that reason, insecure / exploitable.
        But this is not a general argument about optimization, it’s a contingent claim that is only true for some problems of interest, and in a way that depends on the details of those problems.
        And one can make a separate argument that, when conditions like 1-2 do not hold, a focus on pathological cases is unhelpful: if a statement “fails in practice but works in theory” (say by holding except on a set of sufficiently small measure as to always be dominated by other contributions to a decision problem, or only for decisions that would be ruled out anyway for some other reason, or over the finite range relevant for some calculation but not in the long or short limit), optimization will exploit its “effective truth” whether or not you have noticed it. And statements about “effective truth” tend to be mathematically pretty uninteresting; try getting an audience of mathematicians to care about a derivation that rocket engineers can afford to ignore gravitational waves, for example.