Good point about inner alignment problems being a blocker to date-competitiveness for IDA… but aren’t they also a blocker to date-competitiveness for every other alignment scheme too pretty much? What alignment schemes don’t suffer from this problem?
I’m thinking “Do anything useful that a human with a lot of time can do” is going to be substantially less capable than full-blown superintelligent AGI. However, that’s OK, because we can use IDA as a stepping-stone to that. IDA gets us an aligned system substantially more capable than a human, and we use that system to solve the alignment problem and build something even better.
It’s interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness. I think it’s fine to just talk about “competitiveness” full stop, and only bother to specify what we mean more precisely when needed. Sometimes we’ll mean one of the three, sometimes two of the three, sometimes all three.
Good point about inner alignment problems being a blocker to date-competitiveness for IDA… but aren’t they also a blocker to date-competitiveness for every other alignment scheme too pretty much?
I think every alignment approach (other than interpretability-as-a-standalone-approach) that involves contemporary ML (i.e. training large neural networks) may have its date-competitiveness affected by inner alignment.
What alignment schemes don’t suffer from this problem?
Most alignment approaches may have their date-competitiveness affected by inner alignment. (It seems theoretically possible to use whole brain emulation without inner alignment related risks, but as you mentioned elsewhere someone may build a neuromorphic AGI before we get there.)
I’m thinking “Do anything useful that a human with a lot of time can do” is going to be substantially less capable than full-blown superintelligent AGI.
I agree. Even a “narrow AI” system that is just very good at predicting stock prices may outperform “a human with a lot of time” (by leveraging very-hard-to-find causal relations).
Instead of saying we should expect IDA to be performance-competitive, I should have said something like the following: If at some point in the future we get to a situation where trillions of safe AGI systems are deployed—and each system can “only” do anything that a human-with-a-lot-of-time can do—and we manage to not catastrophically screw up until that point, I think humanity will probably be out of the woods. (All of humanity’s regular problems will probably get resolved very quickly, including the lack of coordination.)
Good point about inner alignment problems being a blocker to date-competitiveness for IDA… but aren’t they also a blocker to date-competitiveness for every other alignment scheme too pretty much? What alignment schemes don’t suffer from this problem?
I’m thinking “Do anything useful that a human with a lot of time can do” is going to be substantially less capable than full-blown superintelligent AGI. However, that’s OK, because we can use IDA as a stepping-stone to that. IDA gets us an aligned system substantially more capable than a human, and we use that system to solve the alignment problem and build something even better.
It’s interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness. I think it’s fine to just talk about “competitiveness” full stop, and only bother to specify what we mean more precisely when needed. Sometimes we’ll mean one of the three, sometimes two of the three, sometimes all three.
Also I advocated merging cost and date competitiveness (into training competitiveness), so we have every combination covered.
Oh right, how could I forget! This makes me very happy. :D
I think every alignment approach
(other than interpretability-as-a-standalone-approach)that involves contemporary ML (i.e. training large neural networks) may have its date-competitiveness affected by inner alignment.Most alignment approaches may have their date-competitiveness affected by inner alignment. (It seems theoretically possible to use whole brain emulation without inner alignment related risks, but as you mentioned elsewhere someone may build a neuromorphic AGI before we get there.)
I agree. Even a “narrow AI” system that is just very good at predicting stock prices may outperform “a human with a lot of time” (by leveraging very-hard-to-find causal relations).
Instead of saying we should expect IDA to be performance-competitive, I should have said something like the following: If at some point in the future we get to a situation where trillions of safe AGI systems are deployed—and each system can “only” do anything that a human-with-a-lot-of-time can do—and we manage to not catastrophically screw up until that point, I think humanity will probably be out of the woods. (All of humanity’s regular problems will probably get resolved very quickly, including the lack of coordination.)