strong disagree. i would be highly surprised if there were multiple essentially different algorithms to achieve general intelligence*.
I also agree with the Daniel Murfet’s quote. There is a difference between a disjunction before you see the data and a disjunction after you see the data. I agree AI development is disjunctive before you see the data—but in hindsight all the things that work are really minor variants on a single thing that works.
*of course “essentially different” is doing a lot of work here. some of the conceptual foundations of intelligence haven’t been worked out enough (or Vanessa has and I don’t understand it yet) for me to make a formal statement here.
Re different algorithms, I actually agree with both you and Daniel Murfet in that conditional on non-reversible computers, there is at most 1-3 algorithms to achieve intelligence that can scale arbitrarily large, and I’m closer to 1 than 3 here.
But once reversible computers/superconducting wires are allowed, all bets are off on how many algorithms are allowed, because you can have far, far more computation with far, far less waste heat leaving, and a lot of the design of computers is due to heat requirements.
Reversible computing and superconducting wires seem like hardware innovations. You are saying that this will actually materially change the nature of the algorithm you’d want to run?
I’d bet against. I’d be surprised if this was the case. As far as I can tell everything we have so seen so far points to a common simple core of general intelligence algorithm (basically an open-loop RL algorithm on top of a pre-trained transformers). I’d be surprised if there were materially different ways to do this. One of the main takeaways of the last decade of deep learning process is just how little architecture matters—it’s almost all data and compute (plus I claim one extra ingredient, open-loop RL that is efficient on long horizons and sparse data novel domains)
I don’t know for certain of course. If I look at theoretical CS though the universality of computation makes me skeptical of radically different algorithms.
strong disagree. i would be highly surprised if there were multiple essentially different algorithms to achieve general intelligence*.
I also agree with the Daniel Murfet’s quote. There is a difference between a disjunction before you see the data and a disjunction after you see the data. I agree AI development is disjunctive before you see the data—but in hindsight all the things that work are really minor variants on a single thing that works.
*of course “essentially different” is doing a lot of work here. some of the conceptual foundations of intelligence haven’t been worked out enough (or Vanessa has and I don’t understand it yet) for me to make a formal statement here.
Re different algorithms, I actually agree with both you and Daniel Murfet in that conditional on non-reversible computers, there is at most 1-3 algorithms to achieve intelligence that can scale arbitrarily large, and I’m closer to 1 than 3 here.
But once reversible computers/superconducting wires are allowed, all bets are off on how many algorithms are allowed, because you can have far, far more computation with far, far less waste heat leaving, and a lot of the design of computers is due to heat requirements.
Reversible computing and superconducting wires seem like hardware innovations. You are saying that this will actually materially change the nature of the algorithm you’d want to run?
I’d bet against. I’d be surprised if this was the case. As far as I can tell everything we have so seen so far points to a common simple core of general intelligence algorithm (basically an open-loop RL algorithm on top of a pre-trained transformers). I’d be surprised if there were materially different ways to do this. One of the main takeaways of the last decade of deep learning process is just how little architecture matters—it’s almost all data and compute (plus I claim one extra ingredient, open-loop RL that is efficient on long horizons and sparse data novel domains)
I don’t know for certain of course. If I look at theoretical CS though the universality of computation makes me skeptical of radically different algorithms.