Wei Dai comments on Q&A with new Executive Director of Singularity Institute

Wei Dai Nov 15, 2011, 10:23 AM
31 points

I’m not sure that scientific talent is the relevant variable here. More talented folk are more likely to achieve both positive and negative outcomes.

Let’s assume that all the other variables are already optimized for to minimize the risk of creating an UFAI. It seems to me that the the relationship between the ability level of the FAI team and probabilities of the possible outcomes must then look something like this:

This chart isn’t meant to communicate my actual estimates of the probabilities and crossover points, but just the overall shapes of the curves. Do you disagree with them? (If you want to draw your own version, click here and then click on “Modify This Chart”.)

Folk do try to estimate and reduce the risks you mentioned, and to investigate alternative non-FAI interventions.

Has anyone posted SIAI’s estimates of those risks?

I would also worry that signalling issues with a diverse external audience can hinder accurate discussion of important topics

That seems reasonable, and given that I’m more interested in the “strategic” as opposed to “tactical” reasoning within SIAI, I’d be happy for it to be communicated through some other means.
- Eliezer Yudkowsky May 18, 2012, 10:22 PM
  12 points
  Parent
  I like this chart.
- CarlShulman Nov 15, 2011, 7:25 PM
  8 points
  Parent
  
  Do you disagree with them?
  
  If we condition on having all other variables optimized, I’d expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this “halt, melt, and catch fire”) that cannot be shown to be safe (given limited human ability, incentives and bias). If that works (and it’s a separate target in team construction rather than a guarantee, but you specified optimized non-talent variables) then I would expect a big shift of probability from “UFAI” to “null.”
  What links here?
  - Wei Dai Nov 15, 2011, 10:08 PM
    39 points
    Parent
    What I’m afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of “safety” used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace “safety” with “security”. These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I’m sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There’s good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I’m coming from.
    What links here?
    Thoughts on the Singularity Institute (SI) by HoldenKarnofsky (May 11, 2012, 4:31 AM; 330 points)
    Wei Dai's comment on Provably Safe AI: Worldview and Projects by Ben Goldhaber (Aug 11, 2024, 6:13 PM; 48 points)
    Holden Karnofsky’s Singularity Institute Objection 1 by Paul Crowley (May 11, 2012, 7:16 AM; 12 points)
    Wei Dai's comment on Reply to Holden on ‘Tool AI’ by Eliezer Yudkowsky (Aug 29, 2012, 8:55 PM; 1 point)
    - cousin_it Nov 16, 2011, 2:23 PM
      12 points
      Parent
      Your comment has finally convinced me to study some practical crypto because it seems to have fruitful analogies to FAI. It’s especially awesome that one of the references in the linked article is “An Attack Against SSH2 Protocol” by W. Dai.
      - gwern Nov 17, 2011, 1:24 AM
        8 points
        Parent
        More than fruitful analogies, I’d say: http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/
    - John_Maxwell Mar 23, 2012, 6:51 AM
      4 points
      Parent
      
      From where I’m sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations.
      
      Correct me if I’m wrong, but it doesn’t seem as though “proofs” of algorithm correctness fail as frequently as “proofs” of cryptosystem unbreakableness.
      
      Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
      - Wei Dai Mar 23, 2012, 7:07 AM
        13 points
        Parent
        Interesting question. I guess proofs of algorithm correctness fail less often because:
        
        It’s easier to empirically test algorithms to weed out the incorrect ones, so there are fewer efforts to prove conjectures of correctness that are actually false.
        It’s easier to formalize what it means for an algorithm to be correct than for a cryptosystem to be secure.
        
        In both respects, proving Friendliness seems even worse than proving security.
    - CarlShulman Nov 15, 2011, 10:25 PM
      0 points
      Parent
      
      What I’m afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or that the formalization of the notion of “safety” used by the proof is wrong.
      
      Thanks for clarifying.
      
      This kind of thing happens a lot in cryptography,
      
      I agree.
- Vladimir_Nesov Nov 15, 2011, 12:32 PM
  0 points
  Parent
  I can’t count myself “world class” on the raw ability axis, but I’m pretty sure that probability of a team of people like me producing UFAI is very low (in absolute value), as I know when I understand something and when I yet don’t, and I think this property would be even more reliable if I had better raw ability. That is a much more relevant safety factor than ability (but seems harder to test) that changes the shape of UFAI curve. A couple of levels worse than myself, I wouldn’t trust someone’s ability to disbelieve wrong things, so the maximum should probably be in this range, not centered on “world class” in particular.
- XiXiDu Nov 15, 2011, 10:58 AM
  −1 points
  Parent
  Could you elaborate on the ability axis. Could you name some people that you perceive to be of world class ability in their field. Could you further explain if you believe that there are people who are sufficiently above that class.
  
  For example, what about Terence Tao? What about the current SIAI team?