There aren’t really any non-extremely-leaky abstractions in big NNs on top of something like a “directions and simple functions on these directions” layer. (I originally heard this take from Buck)
It’s very hard to piece together understanding of NNs from these low-level components[1]
That said I also think it’s plausible that understanding the low-level could help a lot with understanding the high level even if there is a bunch of other needed work.
This will depend on the way in which you understand or operate on low level components of course. Like if you could predict behavior perfectly just from a short text description for all low level components then you’d be fine. But this is obviously impossible in the same way it’s impossible for transistors. You’ll have to make reference to other concepts etc, and then you’ll probably have a hard time.
There aren’t really any non-extremely-leaky abstractions in big NNs on top of something like a “directions and simple functions on these directions” layer. (I originally heard this take from Buck)
Of course this depends on what it’s trained to do? And it’s false for humans and animals and corporations and markets, we have pretty good abstractions that allow us to predict and sometimes modify the behavior of these entities.
I’d be pretty shocked if this statement was true for AGI.
I think this is going to depend on exactly what you mean by non-extremely-leaky abstractions.
For the notion I was thinking of, I think humans, animals, corporations, and markets don’t seem to have this.
I’m thinking of something like “some decomposition or guide which lets you accurately predict all behavior”. And then the question is how good are the best abstractions in such a decomposition.
There are obviously less complete abstractions.
(Tbc, there are abstractions on top of “atoms” in humans and abstractions on top of chemicals. But I’m not sure if there are very good abstractions on top of neurons which let you really understand everything that is going on.)
Ah I see, I was referring to less complete abstractions. The “accurately predict all behavior” definition is fine, but this comes with a scale of how accurate the prediction is. “Directions and simple functions on these directions” probably misses some tiny details like floating point errors, and if you wanted a human to understand it you’d have to use approximations that lose way more accuracy. I’m happy to lose accuracy in exchange for better predictions about behavior in previously-unobserved situations. In particular, it’s important to be able to work out what sort of previously-unobserved situation might lead to danger. We can do this with humans and animals etc, we can’t do it with “directions and simple functions on these directions”.
My guess is that it’s pretty likely that all of:
There aren’t really any non-extremely-leaky abstractions in big NNs on top of something like a “directions and simple functions on these directions” layer. (I originally heard this take from Buck)
It’s very hard to piece together understanding of NNs from these low-level components[1]
It’s even worse if your understanding of low-level components is poor (only a small fraction of the training compute is explained)
That said I also think it’s plausible that understanding the low-level could help a lot with understanding the high level even if there is a bunch of other needed work.
This will depend on the way in which you understand or operate on low level components of course. Like if you could predict behavior perfectly just from a short text description for all low level components then you’d be fine. But this is obviously impossible in the same way it’s impossible for transistors. You’ll have to make reference to other concepts etc, and then you’ll probably have a hard time.
Of course this depends on what it’s trained to do? And it’s false for humans and animals and corporations and markets, we have pretty good abstractions that allow us to predict and sometimes modify the behavior of these entities.
I’d be pretty shocked if this statement was true for AGI.
I think this is going to depend on exactly what you mean by non-extremely-leaky abstractions.
For the notion I was thinking of, I think humans, animals, corporations, and markets don’t seem to have this.
I’m thinking of something like “some decomposition or guide which lets you accurately predict all behavior”. And then the question is how good are the best abstractions in such a decomposition.
There are obviously less complete abstractions.
(Tbc, there are abstractions on top of “atoms” in humans and abstractions on top of chemicals. But I’m not sure if there are very good abstractions on top of neurons which let you really understand everything that is going on.)
Ah I see, I was referring to less complete abstractions. The “accurately predict all behavior” definition is fine, but this comes with a scale of how accurate the prediction is. “Directions and simple functions on these directions” probably misses some tiny details like floating point errors, and if you wanted a human to understand it you’d have to use approximations that lose way more accuracy. I’m happy to lose accuracy in exchange for better predictions about behavior in previously-unobserved situations. In particular, it’s important to be able to work out what sort of previously-unobserved situation might lead to danger. We can do this with humans and animals etc, we can’t do it with “directions and simple functions on these directions”.