Of course—but all of your examples are not just conceptually equivalent—they are functionally equivalent (they can emulate each other). They are all computational foundations for constructing UTMs—although not all foundations are truly practical and efficient. Likewise there are many routes to implementing a ULM—biology is one example, modern digital computers is another.
Universal computers are equivalent in the sense that any two can simulate each other in polynomial time. ULMs should probably be equivalent in the sense that each can efficiently learn to behave like the other. But it doesn’t imply the software architectures have to be similar. For example I see no reason to assume any ULM should be anything like a neural net.
Well I said “most everything”, and I stressed several times in the article that much of the innate complexity budget is spent on encoding the value/reward system and the learning machinery (which are closely intertwined).
Any value hard coded in human will have to be transferred to the AI in a way different than universal learning. And another thing: teaching an AIs values by placing it in a human environment and counting on reinforcement learning can fail spectacularly if the AIs intelligence grows much faster than that of a human child.
Rather the key is that knowing one is in a sim and then knowing how to escape should be difficult enough to allow for sufficient time to evaluate the agent’s morality, worth/utility to society, and potential future impact.
This is an assumption which might or might not be correct. I would definitely not bet our survival on this assumption without much further evidence.
Introspection and verbalization of introspective insights are specific complex computations that require circuitry—they are not somehow innate to a ULM, because nothing is.
OK, but a ULM is supposed to be able to learn anything. A human brain is never going to learn to rearrange its low level circuitry to efficiently perform operations like numerical calculation.
Here is a useful analogy: a simple abstract turing machine is to a modern GPU as a simple abstract ULM is to the brain. There is a huge engineering gap between the simplest early version of an idea, and a subsequent scaled up complex practical efficient version.
The difference is that we have a solid mathematical theory of Turing machines whereas ULMs, as far as I can see, are only an informal idea so far.
But it doesn’t imply the software architectures have to be similar. For example I see no reason to assume any ULM should be anything like a neural net.
Sure—any general model can simulate any other. Neural networks have strong practical advantages. Their operator base is based on fmads, which is a good match for modern computers. They allow explicit search of program space in terms of the execution graph, which is extremely powerful because it allows one to a priori exclude all programs which don’t halt—you can constrain the search to focus on programs with exact known computational requirements.
Neural nets make deep factoring easy, and deep factoring is the single most important huge gain in any general optimization/learning system: it allows for exponential (albeit limited) speedup.
And another thing: teaching an AIs values by placing it in a human environment and counting on reinforcement learning can fail spectacularly if the AIs intelligence grows much faster than that of a human child.
Yes. There are pitfalls, and in general much more research to do on value learning before we get to useful AGI, let alone safe AGI.
A human brain is never going to learn to rearrange its low level circuitry to efficiently perform operations like numerical calculation.
This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it’s more like 10hz. Most people can do basic arithmetic in less than a second, which roughly maps to a dozen clock cycles or so, maybe less. That actually is comparable to many computers—for example on the current maxwell GPU architecture (nvidia’s latest and greatest), even the simpler instructions have a latency of about 6 cycles.
Now, obviously the arithmetic ops that most humans can do in less than a second is very limited—it’s like a minimal 3 bit machine. But some atypical humans can do larger scale arithmetic at the same speed.
Point is, you need to compare everything adjusted for the 6 order of magnitude speed difference.
...They allow explicit search of program space in terms of the execution graph, which is extremely powerful because it allows one to a priori exclude all programs which don’t halt—you can constrain the search to focus on programs with exact known computational requirements.
Right. So Boolean circuits are a better analogy than Turing machines.
Neural nets make deep factoring easy, and deep factoring is the single most important huge gain in any general optimization/learning system: it allows for exponential (albeit limited) speedup.
I’m sorry, what is deep factoring? A reference perhaps?
There are pitfalls, and in general much more research to do on value learning before we get to useful AGI, let alone safe AGI.
I completely agree.
This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it’s more like 10hz...
Good point! Nevertheless, it seems to me very dubious that the human brain can learn to do anything within the limits of its computing power. For example, why can’t I learn to look at a page full of exercises in arithmetics and solve all of them in parallel?
Right. So Boolean circuits are a better analogy than Turing machines.
They are of course equivalent in theory, but in practice directly searching through a boolean circuit space is much wiser than searching through a program space. Searching through analog/algebraic circuit space is even better, because you can take advantage of fmads instead of having to spend enormous circuit complexity emulating them. Neural nets are even better than that, because they enforce a mostly continous/differentiable energy landscape which helps inference/optimization.
I’m sorry, what is deep factoring? A reference perhaps?
It’s the general idea that you can reuse subcomputations amongst models and layers. Solonomoff induction is retarded for a number of reasons, but one is this: it treats every function/model as entirely distinct. So if you have say one high level model which has developed a good cat detector, that isn’t shared amongst the other models. Deep nets (of various forms) automatically share submodel components AND subcomputations/subexpressions amongst those submodels. That incredibly, massively speeds up the search. That is deep factoring.
All the successful multi-layer models use deep factoring to some degree. This paper: Sum-Product Networks explains the general idea pretty well.
Good point! Nevertheless, it seems to me very dubious that the human brain can learn to do anything within the limits of its computing power. For example, why can’t I learn to look at a page full of exercises in arithmetics and solve all of them in parallel?
There’s alot of reasons. First, due to nonlinear foveation your visual system can only read/parse a couple of words/symbols during each saccade—only those right in the narrow center of the visual cone, the fovea. So it takes a number of clock cycles or steps to scan the entire page, and your brain only has limited working memory to put stuff in.
Secondly, the bigger problem is that even if you already know how to solve a math problem, just parsing many math problems requires a number of steps, and then actually solving them—even if you know the ideal algorithm that requires the minimal number of steps—that minimal number of steps can still be quite large.
Many interesting problems still require a number of serial steps to solve, even with an infinite parallel machine. Sorting is one simple example.
...Neural nets are even better than that, because they enforce a mostly continous/differentiable energy landscape which helps inference/optimization.
I wonder whether this is a general property or is the success of continuous methods limited to problem with natural continuous models like vision.
Deep nets (of various forms) automatically share submodel components AND subcomputations/subexpressions amongst those submodels.
Yes, this is probably important.
First, due to nonlinear foveation your visual system can only read/parse a couple of words/symbols during each saccade—only those right in the narrow center of the visual cone, the fovea. So it takes a number of clock cycles or steps to scan the entire page, and your brain only has limited working memory to put stuff in.
Scanning the page is clearly not the bottleneck: I can read the page much faster than solve the exercises. “Limited working memory” sounds a claim that higher cognition has much less computing resources than low level tasks. Clearly visual processing requires much more “working memory” than solving a couple of dozens of exercises in arithmetic. But if we accept this constraint then does the brain still qualify for a ULM? It seems to me that if there is a deficiency of the brain’s architecture that prevents higher cognition from enjoying the brain’s full power, solving this deficiency definitely counts as an “architectural innovation”.
This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it’s more like 10hz.
Mechanical calculators were slower than that, and still they were very much better at numeric computation than most humans, which made them incredibly useful.
Now, obviously the arithmetic ops that most humans can do in less than a second is very limited—it’s like a minimal 3 bit machine. But some atypical humans can do larger scale arithmetic at the same speed.
Indeed these are very rare people. The vast majority of people, even if they worked for decades in accounting, can’t learn to do numeric computation as fast and accurately as a mechanical calculator does.
The vast majority of people, even if they worked for decades in accounting, can’t learn to do numeric computation as fast and accurately as a mechanical calculator does.
The problems aren’t even remotely comparable. A human is solving a much more complex problem—the inputs are in the form of visual or auditory signals which first need to be recognized and processed into symbolic numbers. The actual computation step is trivial and probably only involves a handful or even a single cycle.
I admit that I somewhat let you walk into this trap by not mentioning it earlier … this example shows that the brain can learn near optimal (in terms of circuit depth or cycles) solutions for these simple arithmetic problems. The main limitation is that the brain’s hardware is strongly suited to approximate inference problems, and not exact solutions, so any exact operators require memoization. This is actually a good thing, and any practical AGI will need to have a similar prior.
Thank you for replying!
Universal computers are equivalent in the sense that any two can simulate each other in polynomial time. ULMs should probably be equivalent in the sense that each can efficiently learn to behave like the other. But it doesn’t imply the software architectures have to be similar. For example I see no reason to assume any ULM should be anything like a neural net.
Any value hard coded in human will have to be transferred to the AI in a way different than universal learning. And another thing: teaching an AIs values by placing it in a human environment and counting on reinforcement learning can fail spectacularly if the AIs intelligence grows much faster than that of a human child.
This is an assumption which might or might not be correct. I would definitely not bet our survival on this assumption without much further evidence.
OK, but a ULM is supposed to be able to learn anything. A human brain is never going to learn to rearrange its low level circuitry to efficiently perform operations like numerical calculation.
The difference is that we have a solid mathematical theory of Turing machines whereas ULMs, as far as I can see, are only an informal idea so far.
Sure—any general model can simulate any other. Neural networks have strong practical advantages. Their operator base is based on fmads, which is a good match for modern computers. They allow explicit search of program space in terms of the execution graph, which is extremely powerful because it allows one to a priori exclude all programs which don’t halt—you can constrain the search to focus on programs with exact known computational requirements.
Neural nets make deep factoring easy, and deep factoring is the single most important huge gain in any general optimization/learning system: it allows for exponential (albeit limited) speedup.
Yes. There are pitfalls, and in general much more research to do on value learning before we get to useful AGI, let alone safe AGI.
This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it’s more like 10hz. Most people can do basic arithmetic in less than a second, which roughly maps to a dozen clock cycles or so, maybe less. That actually is comparable to many computers—for example on the current maxwell GPU architecture (nvidia’s latest and greatest), even the simpler instructions have a latency of about 6 cycles.
Now, obviously the arithmetic ops that most humans can do in less than a second is very limited—it’s like a minimal 3 bit machine. But some atypical humans can do larger scale arithmetic at the same speed.
Point is, you need to compare everything adjusted for the 6 order of magnitude speed difference.
Right. So Boolean circuits are a better analogy than Turing machines.
I’m sorry, what is deep factoring? A reference perhaps?
I completely agree.
Good point! Nevertheless, it seems to me very dubious that the human brain can learn to do anything within the limits of its computing power. For example, why can’t I learn to look at a page full of exercises in arithmetics and solve all of them in parallel?
They are of course equivalent in theory, but in practice directly searching through a boolean circuit space is much wiser than searching through a program space. Searching through analog/algebraic circuit space is even better, because you can take advantage of fmads instead of having to spend enormous circuit complexity emulating them. Neural nets are even better than that, because they enforce a mostly continous/differentiable energy landscape which helps inference/optimization.
It’s the general idea that you can reuse subcomputations amongst models and layers. Solonomoff induction is retarded for a number of reasons, but one is this: it treats every function/model as entirely distinct. So if you have say one high level model which has developed a good cat detector, that isn’t shared amongst the other models. Deep nets (of various forms) automatically share submodel components AND subcomputations/subexpressions amongst those submodels. That incredibly, massively speeds up the search. That is deep factoring.
All the successful multi-layer models use deep factoring to some degree. This paper: Sum-Product Networks explains the general idea pretty well.
There’s alot of reasons. First, due to nonlinear foveation your visual system can only read/parse a couple of words/symbols during each saccade—only those right in the narrow center of the visual cone, the fovea. So it takes a number of clock cycles or steps to scan the entire page, and your brain only has limited working memory to put stuff in.
Secondly, the bigger problem is that even if you already know how to solve a math problem, just parsing many math problems requires a number of steps, and then actually solving them—even if you know the ideal algorithm that requires the minimal number of steps—that minimal number of steps can still be quite large.
Many interesting problems still require a number of serial steps to solve, even with an infinite parallel machine. Sorting is one simple example.
I wonder whether this is a general property or is the success of continuous methods limited to problem with natural continuous models like vision.
Yes, this is probably important.
Scanning the page is clearly not the bottleneck: I can read the page much faster than solve the exercises. “Limited working memory” sounds a claim that higher cognition has much less computing resources than low level tasks. Clearly visual processing requires much more “working memory” than solving a couple of dozens of exercises in arithmetic. But if we accept this constraint then does the brain still qualify for a ULM? It seems to me that if there is a deficiency of the brain’s architecture that prevents higher cognition from enjoying the brain’s full power, solving this deficiency definitely counts as an “architectural innovation”.
Mechanical calculators were slower than that, and still they were very much better at numeric computation than most humans, which made them incredibly useful.
Indeed these are very rare people. The vast majority of people, even if they worked for decades in accounting, can’t learn to do numeric computation as fast and accurately as a mechanical calculator does.
The problems aren’t even remotely comparable. A human is solving a much more complex problem—the inputs are in the form of visual or auditory signals which first need to be recognized and processed into symbolic numbers. The actual computation step is trivial and probably only involves a handful or even a single cycle.
I admit that I somewhat let you walk into this trap by not mentioning it earlier … this example shows that the brain can learn near optimal (in terms of circuit depth or cycles) solutions for these simple arithmetic problems. The main limitation is that the brain’s hardware is strongly suited to approximate inference problems, and not exact solutions, so any exact operators require memoization. This is actually a good thing, and any practical AGI will need to have a similar prior.