RussellThor comments on RussellThor’s Shortform

RussellThor 12 Sep 2024 10:38 UTC
3 points
0
How does intelligence scale with processing power
A default position is that exponentially more processing power is needed for a constant increase in intelligence.
To start, lets assume a guided/intuition + search model for intelligence. That is like Chess or Go where you have an evaluation module and a search module. In simple situations an exponential increase in processing power usually gives a linear increase in lookahead ability and rating/ELO in games measured that way.
However does this match reality?
What if the longer the time horizon, the bigger the board became, or the more complexity was introduced. For board games there is usually a constant number of possibilities to search at every ply of lookahead depth. However I think in reality that you can argue the search space should increase with time or lookahead steps. That is as you look further ahead, possibilities you didn’t have to consider before now come in the search.
For a real world example consider predicting the price of a house. As the timeframe goes from <5 years to >5 years, then there are new factors to consider e.g. changing govt policy, unexpected changes in transport patterns, (new rail nearby or in competing suburb etc), demographic changes.
In situations like these, the processing required for a constant increase in ability could go up faster than exponentially. For example looking 2 steps ahead requires 2 possibilities at each step, that is 2^2, but if its 4 steps ahead, then maybe the cost is now 3^4 as there are 3 vs 2 things to affect the result in 4 steps.
How does this affect engineering of new systems
If applies to engineering, then actual physical data will be very valuable to shrink the search space. (Well that applies if it just goes up exponentially as well) That is if you can measure the desired situation or new device state at step 10 of a 20 stage process, then you can hugely reduce the search space as you can eliminate many possibilities. Zero-shot is hard unless you can really keep the system in situations where there are no additional effects coming in.
AI models, regulations, deployments, expectations
For a simple evaluation/search model of intelligence, with just one model being used for the evaluation, improvements can be made by continually improving the evaluation model (same size better performance/same performance, smaller size). Models that produce fewer bad “candidate ideas” can be chosen, with the search itself providing feedback on what ideas had potential. In this model there is no take-off or overhang to speak of.
However I expect a TAI system to be more complicated.
I can imagine an overseer model that decides what more specialist models to use. There is a difficulty knowing what model/field of expertise to use for a given goal. Existing regulations don’t really cover these systems, the setup where you train a model, fine tune, test then release doesn’t apply strictly here. You release a set of models, and they continually improve themselves. This is a lot more like people where you continually learn.
Overhang
In this situation you get take-off or overhang where a new model architecture is introduced rather than the steady improvement from deployed systems of models. Its clear to me that the current model architectures and hence scaling laws are not near to the theoretical maximum. For example the training data needed for Tesla auto-pilot is ~10K more than what a human needs and is not superhuman. In terms of risk, its new model architectures (and evidence of very different scaling laws) rather then training FLOPS that would matter.
- quetzal_rainbow 12 Sep 2024 12:29 UTC
  5 points
  3
  Parent
  I think that often overlooked facet of this is that high fluid intelligence leads to higher crystallized intelligence.
  I.e., the more and better you think, the more and better crystallized algorithms you can learn, and, unlike short-term benefits of fluid intelligence, long-term benefits of crystallized intelligence are compounding.
  To find new better strategy linearly faster, you need exponential increase of processing power, but each found and memorized strategy saves you exponential expenditure of processing power in future.