Ohoho! Well, actually Nate, as I personally subscribe to the bounded-rationality school of thinking, and I do actually think this has implications for AI safety. Specifically: as the agent acquires more resources (speed and memory), it can handle larger problems and enlarge its impact on the world, so to make a bounded-rational agent safe, we should, hypothetically, be able to state safety properties explicitly in terms of how much cognitive stuff (philosophically, it all adds up to different ingredients to that magic word “intelligence”) the agent has.
With some kind of framework like that, we’d be able to state and prove safety theorems in the form of, “This design will grow increasingly uncertain about its value function as it grows its cognitive resources, and act more cautiously until receiving more training, and we have some analytic bound telling us exactly how fast this fall-off will happen.” I can even imagine it running along the simple lines of, “As the agent’s model of the world grows more complicated, the entropy/Kolmogorov complexity of that model penalizes hypotheses about the learned value function, thus causing the agent to grow increasingly passive and wait for value training as it learns and grows.”
This requires a framework for normative uncertainty that formalizes acting cautiously when under value-uncertainty, but didn’t someone publish a thesis on that at Oxford a year or two ago?
I would readily accept a statistical-modeling-heavy answer to the question of “but how do you build multi-level world-models from percepts, in principle?”; and indeed, I’d be astonished if you avoided it.
Hmm. If you have lots and lots of computing power, you can always just… not use it. It’s not clear to me how additional computing power can make the problem harder—at worst, it can make the problem no easier. I agree, though, that algorithms for modeling the world from the inside can’t just extrapolate arbitrarily, on pain of exponential complexity; so whatever it takes to build and use multi-level world-models, it can’t be that.
Well, as jacob_cannell pointed out, feeding more compute-power to a bounded-rational agent ought to make it enlarge its models in terms of theory-depth, theory-preorder-connectedness, variance-explanation, and time-horizon. In very short: the branching factors and the hypothesis class get larger, making it harder to learn (if we’re thinking about statistical learning theory).
There’s also the specific issue of assuming Turing-machine-level compute power, assuming that “available compute steps” and “available memory” is an unbounded but finite natural number. Since you’ve not bounded the number, it’s effectively infinite, which of course means that two agents, each of which is “programmed” as a Turing-machine with Turing-machine resources rather than strictly finite resources, can’t reason about each-other: either one would need ordinal numbers to think about what the other (or itself) can do, but actually using ordinal numbers in that analysis would be necessarily wrong (in that neither actually possesses a Turing Oracle, which is equivalent to having w_0 steps of computation).
So you get a bunch of paradox theorems making your job a lot harder.
In contrast, starting from the assumption of having strictly finite computing power is like when E.T. Jaynes starts from the assumption of having finite sample data, finite log-odds, countable hypotheses, etc.: we assume what must necessarily be true in reality to start with, and then analyze the infinite case as passing to the limit of some finite number. Pascal’s Mugging is solvable this way using normal computational Bayesian statistical techniques, for instance, if we assume that we can sample outcomes from our hypothesis distribution.
Ohoho! Well, actually Nate, as I personally subscribe to the bounded-rationality school of thinking, and I do actually think this has implications for AI safety. Specifically: as the agent acquires more resources (speed and memory), it can handle larger problems and enlarge its impact on the world, so to make a bounded-rational agent safe, we should, hypothetically, be able to state safety properties explicitly in terms of how much cognitive stuff (philosophically, it all adds up to different ingredients to that magic word “intelligence”) the agent has.
With some kind of framework like that, we’d be able to state and prove safety theorems in the form of, “This design will grow increasingly uncertain about its value function as it grows its cognitive resources, and act more cautiously until receiving more training, and we have some analytic bound telling us exactly how fast this fall-off will happen.” I can even imagine it running along the simple lines of, “As the agent’s model of the world grows more complicated, the entropy/Kolmogorov complexity of that model penalizes hypotheses about the learned value function, thus causing the agent to grow increasingly passive and wait for value training as it learns and grows.”
This requires a framework for normative uncertainty that formalizes acting cautiously when under value-uncertainty, but didn’t someone publish a thesis on that at Oxford a year or two ago?
Can I laugh maniacally at least a little bit now?
Well, as jacob_cannell pointed out, feeding more compute-power to a bounded-rational agent ought to make it enlarge its models in terms of theory-depth, theory-preorder-connectedness, variance-explanation, and time-horizon. In very short: the branching factors and the hypothesis class get larger, making it harder to learn (if we’re thinking about statistical learning theory).
There’s also the specific issue of assuming Turing-machine-level compute power, assuming that “available compute steps” and “available memory” is an unbounded but finite natural number. Since you’ve not bounded the number, it’s effectively infinite, which of course means that two agents, each of which is “programmed” as a Turing-machine with Turing-machine resources rather than strictly finite resources, can’t reason about each-other: either one would need ordinal numbers to think about what the other (or itself) can do, but actually using ordinal numbers in that analysis would be necessarily wrong (in that neither actually possesses a Turing Oracle, which is equivalent to having w_0 steps of computation).
So you get a bunch of paradox theorems making your job a lot harder.
In contrast, starting from the assumption of having strictly finite computing power is like when E.T. Jaynes starts from the assumption of having finite sample data, finite log-odds, countable hypotheses, etc.: we assume what must necessarily be true in reality to start with, and then analyze the infinite case as passing to the limit of some finite number. Pascal’s Mugging is solvable this way using normal computational Bayesian statistical techniques, for instance, if we assume that we can sample outcomes from our hypothesis distribution.