Ofer comments on Three Kinds of Competitiveness

Ofer 31 Mar 2020 15:41 UTC
LW: 3 AF: 2
0
AF
Very interesting definitions! I like the way they’re used here to compare different scenarios.
Proposal: Iterated Distillation and Amplification: [...] I currently think of this scheme as decently date-competitive but not as cost-competitive or performance-competitive.
I think IDA’s date-competitiveness will depend on the progress we’ll have in inner alignment (or our willingness to bet against inner alignment problems occurring, and whether we’ll be correct about it). Also, I don’t see why we should expect IDA to not be very performance-competitive (if I understand correctly the hope is to get a system that can do anything useful that a human with a lot of time can do).
Generally, when using these definitions for comparing alignment approaches (rather than scenarios) I suspect we’ll end up talking a lot about “the combination of date- and performance-competitiveness”, because I expect the performance-competitiveness of most approaches will depend on how much research effort is invested in them.
- Daniel Kokotajlo 31 Mar 2020 17:42 UTC
  LW: 3 AF: 2
  0
  AF Parent
  Good point about inner alignment problems being a blocker to date-competitiveness for IDA… but aren’t they also a blocker to date-competitiveness for every other alignment scheme too pretty much? What alignment schemes don’t suffer from this problem?
  I’m thinking “Do anything useful that a human with a lot of time can do” is going to be substantially less capable than full-blown superintelligent AGI. However, that’s OK, because we can use IDA as a stepping-stone to that. IDA gets us an aligned system substantially more capable than a human, and we use that system to solve the alignment problem and build something even better.
  It’s interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness. I think it’s fine to just talk about “competitiveness” full stop, and only bother to specify what we mean more precisely when needed. Sometimes we’ll mean one of the three, sometimes two of the three, sometimes all three.
  - evhub 31 Mar 2020 19:11 UTC
    LW: 4 AF: 2
    0
    AF Parent
    
    It’s interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness.
    
    Also I advocated merging cost and date competitiveness (into training competitiveness), so we have every combination covered.
    - Daniel Kokotajlo 31 Mar 2020 20:01 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Oh right, how could I forget! This makes me very happy. :D
  - Ofer 1 Apr 2020 7:33 UTC
    LW: 3 AF: 2
    0
    AF Parent
    
    Good point about inner alignment problems being a blocker to date-competitiveness for IDA… but aren’t they also a blocker to date-competitiveness for every other alignment scheme too pretty much?
    
    I think every alignment approach ~~(other than interpretability-as-a-standalone-approach)~~ that involves contemporary ML (i.e. training large neural networks) may have its date-competitiveness affected by inner alignment.
    
    What alignment schemes don’t suffer from this problem?
    
    Most alignment approaches may have their date-competitiveness affected by inner alignment. (It seems theoretically possible to use whole brain emulation without inner alignment related risks, but as you mentioned elsewhere someone may build a neuromorphic AGI before we get there.)
    
    I’m thinking “Do anything useful that a human with a lot of time can do” is going to be substantially less capable than full-blown superintelligent AGI.
    
    I agree. Even a “narrow AI” system that is just very good at predicting stock prices may outperform “a human with a lot of time” (by leveraging very-hard-to-find causal relations).
    
    Instead of saying we should expect IDA to be performance-competitive, I should have said something like the following: If at some point in the future we get to a situation where trillions of safe AGI systems are deployed—and each system can “only” do anything that a human-with-a-lot-of-time can do—and we manage to not catastrophically screw up until that point, I think humanity will probably be out of the woods. (All of humanity’s regular problems will probably get resolved very quickly, including the lack of coordination.)