A. It is possible to construct a benchmark to measure if a machine is a general ASI. This would be a very large number of tasks, many simulated though some may be robotic tasks in isolated labs. A general ASI benchmark would have to include tasks humans do not know how to do, but we know how to measure success.
B. We have enough computational resources to train from scratch many ASI level systems so that thousands of attempts are possible. Most attempts would reuse pretrained components in a different architecture.
C. We recursively task the best performing AGIs, as measured by the above benchmark or one meant for weaker systems, to design architectures to perform well on (A)
Currently the best we can do is use RL to design better neural networks, by finding better network architectures and activation functions. Swish was found this way, not sure how much transformer network design came from this type of recursion.
Main idea : the AGI systems exploring possible network architectures are cognitively able to take into account all published research and all past experimental runs, and the ones “in charge” are the ones who demonstrated the most measurable merit at designing prior AGI because they produced the highest performing models on the benchmark.
I think if you think about it you’ll realize it compute were limitless, this AGI to ASI transition you mention could happen instantly. A science fiction story would have it happen in hours. In reality, since training a subhuman system is taking 10k GPUs about 10 days to train, and an AGI will take more—Sam Altman has estimated the compute bill will be close to 100 billion—that’s the limiting factor. You might be right and we stay “stuck” at AGI for years until the resources to discover ASI become available.
I mean, this sounds like a brute force attack to the problem, something that ought not to be very efficient. If our AGI is roughly as smart as the 75th percentile of human engineers it might still just hit its head against a sufficiently hard problem, even in parallel, and especially if we give it the wrong prompt by assuming that the solution will be the extension of current approaches rather than a new one that requires to go back before you can go forward.
You’re correct. In the narrow domain of designing AI architectures you need the system to be at least 1.01 times as good as a human. You want more gain than that because there is a cost to running the system.
Getting gain seems to be trivially easy at least for the types of AI design tasks this has been tried on. Humans are bad at designing network architectures and activation functions.
I theorize that a machine could study the data flows from snapshots from an AI architecture attempting tasks on the AGI/ASI gym, and use that information as well as all previous results to design better architectures.
The last bit is where I expect enormous gain, because the training data set will exceed the amount of data humans can take in in a lifetime, and you would obviously have many smaller “training exercises” to design small systems to build up a general ability. (Enormous early gain. Eventually architectures are going to approach the limits allowed by the underlying compute and datasets)
Assumptions:
A. It is possible to construct a benchmark to measure if a machine is a general ASI. This would be a very large number of tasks, many simulated though some may be robotic tasks in isolated labs. A general ASI benchmark would have to include tasks humans do not know how to do, but we know how to measure success.
B. We have enough computational resources to train from scratch many ASI level systems so that thousands of attempts are possible. Most attempts would reuse pretrained components in a different architecture.
C. We recursively task the best performing AGIs, as measured by the above benchmark or one meant for weaker systems, to design architectures to perform well on (A)
Currently the best we can do is use RL to design better neural networks, by finding better network architectures and activation functions. Swish was found this way, not sure how much transformer network design came from this type of recursion.
Main idea : the AGI systems exploring possible network architectures are cognitively able to take into account all published research and all past experimental runs, and the ones “in charge” are the ones who demonstrated the most measurable merit at designing prior AGI because they produced the highest performing models on the benchmark.
I think if you think about it you’ll realize it compute were limitless, this AGI to ASI transition you mention could happen instantly. A science fiction story would have it happen in hours. In reality, since training a subhuman system is taking 10k GPUs about 10 days to train, and an AGI will take more—Sam Altman has estimated the compute bill will be close to 100 billion—that’s the limiting factor. You might be right and we stay “stuck” at AGI for years until the resources to discover ASI become available.
I mean, this sounds like a brute force attack to the problem, something that ought not to be very efficient. If our AGI is roughly as smart as the 75th percentile of human engineers it might still just hit its head against a sufficiently hard problem, even in parallel, and especially if we give it the wrong prompt by assuming that the solution will be the extension of current approaches rather than a new one that requires to go back before you can go forward.
You’re correct. In the narrow domain of designing AI architectures you need the system to be at least 1.01 times as good as a human. You want more gain than that because there is a cost to running the system.
Getting gain seems to be trivially easy at least for the types of AI design tasks this has been tried on. Humans are bad at designing network architectures and activation functions.
I theorize that a machine could study the data flows from snapshots from an AI architecture attempting tasks on the AGI/ASI gym, and use that information as well as all previous results to design better architectures.
The last bit is where I expect enormous gain, because the training data set will exceed the amount of data humans can take in in a lifetime, and you would obviously have many smaller “training exercises” to design small systems to build up a general ability. (Enormous early gain. Eventually architectures are going to approach the limits allowed by the underlying compute and datasets)