This is with humans doing the research. Humans invent new algorithms more slowly than new chips are made. So it makes sense to adjust the algorithm to the chip. If the AI can do software research far faster than any human, adjusting the software to the hardware (an approach that humans use a lot throughout most of computing) becomes an even better idea.
Note that if you are seeking an improved network architecture and you need it to work on a limited family of chips, this is constraining your search. You may not be able to find a meaningful improvement over the sota with that constraint in place, regardless of your intelligence level. Something like sparsity, in memory compute, neural architecture (this is where the chip is structured like the network it is modeling with dedicated hardware) can offer colossal speedups.
this is constraining your search. You may not be able to find a meaningful improvement over the sota with that constraint in place, regardless of your intelligence level.
I mean the space of algorithms that can run on an existing chip is pretty huge. Yes it is a constraint. And it’s theoretically possible that the search could return no solutions, if the SOTA was achieved with Much better chips, or was near optimal already, or the agent doing the search wasn’t much smarter than us.
For example, there are techniques that decompose a matrix into its largest eigenvectors. Which works great without needing sparse hardware.
Note that if you are seeking an improved network architecture and you need it to work on a limited family of chips, this is constraining your search. You may not be able to find a meaningful improvement over the sota with that constraint in place, regardless of your intelligence level. Something like sparsity, in memory compute, neural architecture (this is where the chip is structured like the network it is modeling with dedicated hardware) can offer colossal speedups.
I mean the space of algorithms that can run on an existing chip is pretty huge. Yes it is a constraint. And it’s theoretically possible that the search could return no solutions, if the SOTA was achieved with Much better chips, or was near optimal already, or the agent doing the search wasn’t much smarter than us.
For example, there are techniques that decompose a matrix into its largest eigenvectors. Which works great without needing sparse hardware.