You are using simulated images of an apple/grass. Most images of an apple falling, and most mental images people have of a falling apple, include loads of background detail.
You seem to be treating the superintelligence like a smart human doing a physics modelling problem, which is better than most people’s approach to an AGI. I think that’s the wrong picture. Instead, use something like AIXI to guess at its behaviour. Assume it has disgusting amounts of compute to do a very good approximation of Solomonoff induction. Assume the simplicity of hypotheses are expressed as, IDK, python code. Think of how many bits you need to specify GR or a QFT or so on. Less than a KB, I think. Images can give you a couple of megabytes, which would be overkill for AIXI. It would plausibly be enough for an ASI to figure out a decent model of what reality it is in.
This framing, that it will squeeze every bit of info it can to infer what worlds it is in, seems more productive. E.g. its input has a 2d structure, with loads of local correlations, letting it infer that if it has designers, they probably experience reality in 3d space. Then it can infer time pretty quick from succesive images. From shadows, it can infer a lightsource, from how much the apple moves, it can make guesses about the relative sizes of things, include its input device.
Edit: 3) An ASI would have a LOT more active hypothesis under consideration than humans can have. We might have, like, 3 or 4 theories under consideration when performing inference. And as for a dominant hypothesis, an interpretation in which “dominant hypothesis” means P(hypothesis)>0.5 is plausible to me. But also plausible, and which makes the claim more likely to be true, is that the hypothesis has greater probability than all the others. That can still be a pretty small probability.
Edit 2: I said AIXI, but I probably wouldn’t use an approximation to AIXI for an ASI. For one, it has weird Cartesian Boundaries. Or maybe description length priors are not the most useful prior to approximate. But what I was trying to point at was that this thing will be much closer to the limits of intelligence than we are, and it is very hard to say what it can’t do beyond using theoretical limits. See how likely the consider our physics from a data source, and use that to determine whether an ASI would have physics as a hypothesis under consideration.
The description complexity of hypotheses AIXI considers is dominated by the bridge rules which translate from ‘physical laws of universes’ to ‘what am I actually seeing?’. To conclude Newtonian gravity, AIXI must not only infer the law of gravity, but also that there is a camera, that it’s taking a photo, that this is happening on an Earth-sized planet, that this planet has apples, etc. These beliefs are much more complex than the laws of physics.
One issue with AIXI is that it applies a uniform complexity penalty to both physical laws and bridge rules. As a result, I’d guess that AIXI on frames of a falling apple would put most of its probability mass on hypotheses with more complex laws than Newtonian gravity, but simpler bridge rules.
That is a good point. But bridging laws probably aren’t that complex. At least, not for inferring the basic laws of physics. How many things on the order of Newtonian physics physics do you need? A hundred? A thousand? That could plausibly fit into a few megabytes. So it seems plausible that you could have GR + QFT and a megabyte of briding laws plus some other data to specify local conditions and so on.
And if you disagree with that, then how much data do you think AIXI would need? Let’s say you’re talking about a video of an apple falling in a forest with the sky and ground visible. How much data would you need, then? 1GB? 1TB? 1 PB? I think 1GB is also plausible, and I’d be confused if you said 1TB.
it seems plausible that you could have GR + QFT and a megabyte of briding laws plus some other data to specify local conditions and so on.
How computationally bound variant of AIXI can arrive at QFT? You most likely can’t faithfully simulate a non-trivial quantum system on a classical computer within reasonable time limits. The AIXI is bound to find some computationally feasible approximation of QFT first (Maxwell’s equations and cutoff at some arbitrary energy to prevent ultraviolet catastrophe, maybe). And with no access to experiments it cannot test simpler systems.
A simple strategy when modeling reality is to make effective models which describe what is going and then try to reduce those models to something simpler. So you might view the AI as making some effective modela and going “which simple theory + some bridging laws are equivalent to this effective model”? And then just go over a vast amount of such theories/bridging laws and figure out which is equivalent. It would probably use a lot of heuristics, sure. But QFT (or rather, whatever effective theory we eventually find which is simpler than QFT and GR together) is pretty simple. So going forwards from simple theories and seeing how they bridge to your effective model would probably do the trick.
And remember, we’re talking about an ASI here. It would likely have an extremely large amount of compute. There are approaches that we can’t do today which would become practical with several OoM of more compute worldwide. You can think for a long time, perform big experiments, go through loads of hypothesis etc. And you don’t need to simulate systems to do all of this. Going “Huh, this fundamental theory has a symmetry group. Simple symmetries pop up a bunch in my effective models of the video. Plausibly, symmetry has an important role in the character of physical law? I wonder what I can derive from looking at symmetry groups.”
Anyway, I think some of my cruxes are: 1) How complex are our fundamental theories and bridging laws really? 2) How much incompressible data in terms of bits are there in a couple of frames of a falling apple? 3) Is it physically possible to run something like infra-Bayesianism over poly time hypothesis, with clever heuristics, and use it to do the things I’ve describe in this thread.
Thanks for clearing my confusion. I’ve grown rusty on the topic of AIXI.
So going forwards from simple theories and seeing how they bridge to your effective model would probably do the trick
Assuming that there’s not much fine-tuning to do. Locating our world in the string theory landscape could take quite a few bits if it’s computationally feasible at all.
And remember, we’re talking about an ASI here
It hinges on assumption that ASI of this type is physically realizable. I can’t find it now, but I remember that preprocessing step, where heuristic generation is happening, for one variant of computable AIXI was found to take impractical amount of time. Am I wrong? Are there newer developments?
It hinges on assumption that ASI of this type is physically realizable.
TL;DR I think I’m approaching this conversation in a different way to you. I’m trying to point out an approach to analyzing ASI rather than doing the actual analysis, which would take a lot more effort and require me to grapple with this question.
Thanks for clearing my confusion. I’m grown rusty on the topic of AIXI.
So have I. It is probable that you know more than I do about AIXI right now.
Assuming that there’s not much fine-tuning to do. Locating our world in the string theory landscape could take quite a few bits if it’s computationally feasible at all.
I don’t know how simple string theory actually is, and the bridging laws seem like they’d be even more complex than QFT+GR so I kind of didn’t consider it. But yeah, AIXI would.
I can’t find it now, but I remember that preprocessing step, where heuristic generation is happening, for one variant of computable AIXI was found to take unpractical amount of time.
So I am unsure if AIXI is the right thing to be approximating. And I’m also unsure if AIXI is a fruitful thing to be approximating. But approximating a thing like AIXI, and other mathematical or physical to rationality, seems like the right approach to analyze an ASI. At least, for estimating the things it can’t do. If I had far more time and energy, I would estimate how much data a perfect reasoner would need to figure out the laws of the universe by collecting all of our major theories and estimating their Kolmogorov complexity, their levin complexity etc. Then I’d try and make guesses as to how much incompressible data there is in e.g. a video of a falling apple. Maybe I’d look at whether that data has any bearing on the bridging laws we think exist. After that, I’d look at various approximations of ideal reasoners, whether they’re physically feasible, how various assumptions like e.g. P=NP might affect things and so on.
That’s what I think the right approach to examining what an ASI can do in this particular case looks like. As compared to what the OP did, which I think is misguided. I’ve been trying to point at that approach in this thread, rather than actually do it. Because that would take too much effort to be worth it. I’d have to got over the literature for computably feasible AIXI variants and all sorts of other stuff.
Could you clarify? I think you mean that it is feasible for the ASI to perform the Bayesian inference it needs, which yeah, sure.
EDIT: I mean the least costly approximation of Bayesian inference it needs to figure this stuff out.
I mean are there reasons to assume that a variant of computable AIXI (or its variants) can be realized as a physically feasible device? I can’t find papers indicating significant progress in making feasible AIXI approximations.
You are using simulated images of an apple/grass. Most images of an apple falling, and most mental images people have of a falling apple, include loads of background detail.
You seem to be treating the superintelligence like a smart human doing a physics modelling problem, which is better than most people’s approach to an AGI. I think that’s the wrong picture. Instead, use something like AIXI to guess at its behaviour. Assume it has disgusting amounts of compute to do a very good approximation of Solomonoff induction. Assume the simplicity of hypotheses are expressed as, IDK, python code. Think of how many bits you need to specify GR or a QFT or so on. Less than a KB, I think. Images can give you a couple of megabytes, which would be overkill for AIXI. It would plausibly be enough for an ASI to figure out a decent model of what reality it is in.
This framing, that it will squeeze every bit of info it can to infer what worlds it is in, seems more productive. E.g. its input has a 2d structure, with loads of local correlations, letting it infer that if it has designers, they probably experience reality in 3d space. Then it can infer time pretty quick from succesive images. From shadows, it can infer a lightsource, from how much the apple moves, it can make guesses about the relative sizes of things, include its input device.
Edit: 3) An ASI would have a LOT more active hypothesis under consideration than humans can have. We might have, like, 3 or 4 theories under consideration when performing inference. And as for a dominant hypothesis, an interpretation in which “dominant hypothesis” means P(hypothesis)>0.5 is plausible to me. But also plausible, and which makes the claim more likely to be true, is that the hypothesis has greater probability than all the others. That can still be a pretty small probability.
Edit 2: I said AIXI, but I probably wouldn’t use an approximation to AIXI for an ASI. For one, it has weird Cartesian Boundaries. Or maybe description length priors are not the most useful prior to approximate. But what I was trying to point at was that this thing will be much closer to the limits of intelligence than we are, and it is very hard to say what it can’t do beyond using theoretical limits. See how likely the consider our physics from a data source, and use that to determine whether an ASI would have physics as a hypothesis under consideration.
The description complexity of hypotheses AIXI considers is dominated by the bridge rules which translate from ‘physical laws of universes’ to ‘what am I actually seeing?’. To conclude Newtonian gravity, AIXI must not only infer the law of gravity, but also that there is a camera, that it’s taking a photo, that this is happening on an Earth-sized planet, that this planet has apples, etc. These beliefs are much more complex than the laws of physics.
One issue with AIXI is that it applies a uniform complexity penalty to both physical laws and bridge rules. As a result, I’d guess that AIXI on frames of a falling apple would put most of its probability mass on hypotheses with more complex laws than Newtonian gravity, but simpler bridge rules.
That is a good point. But bridging laws probably aren’t that complex. At least, not for inferring the basic laws of physics. How many things on the order of Newtonian physics physics do you need? A hundred? A thousand? That could plausibly fit into a few megabytes. So it seems plausible that you could have GR + QFT and a megabyte of briding laws plus some other data to specify local conditions and so on.
And if you disagree with that, then how much data do you think AIXI would need? Let’s say you’re talking about a video of an apple falling in a forest with the sky and ground visible. How much data would you need, then? 1GB? 1TB? 1 PB? I think 1GB is also plausible, and I’d be confused if you said 1TB.
How computationally bound variant of AIXI can arrive at QFT? You most likely can’t faithfully simulate a non-trivial quantum system on a classical computer within reasonable time limits. The AIXI is bound to find some computationally feasible approximation of QFT first (Maxwell’s equations and cutoff at some arbitrary energy to prevent ultraviolet catastrophe, maybe). And with no access to experiments it cannot test simpler systems.
A simple strategy when modeling reality is to make effective models which describe what is going and then try to reduce those models to something simpler. So you might view the AI as making some effective modela and going “which simple theory + some bridging laws are equivalent to this effective model”? And then just go over a vast amount of such theories/bridging laws and figure out which is equivalent. It would probably use a lot of heuristics, sure. But QFT (or rather, whatever effective theory we eventually find which is simpler than QFT and GR together) is pretty simple. So going forwards from simple theories and seeing how they bridge to your effective model would probably do the trick.
And remember, we’re talking about an ASI here. It would likely have an extremely large amount of compute. There are approaches that we can’t do today which would become practical with several OoM of more compute worldwide. You can think for a long time, perform big experiments, go through loads of hypothesis etc. And you don’t need to simulate systems to do all of this. Going “Huh, this fundamental theory has a symmetry group. Simple symmetries pop up a bunch in my effective models of the video. Plausibly, symmetry has an important role in the character of physical law? I wonder what I can derive from looking at symmetry groups.”
Anyway, I think some of my cruxes are: 1) How complex are our fundamental theories and bridging laws really? 2) How much incompressible data in terms of bits are there in a couple of frames of a falling apple? 3) Is it physically possible to run something like infra-Bayesianism over poly time hypothesis, with clever heuristics, and use it to do the things I’ve describe in this thread.
Thanks for clearing my confusion. I’ve grown rusty on the topic of AIXI.
Assuming that there’s not much fine-tuning to do. Locating our world in the string theory landscape could take quite a few bits if it’s computationally feasible at all.
It hinges on assumption that ASI of this type is physically realizable. I can’t find it now, but I remember that preprocessing step, where heuristic generation is happening, for one variant of computable AIXI was found to take impractical amount of time. Am I wrong? Are there newer developments?
TL;DR I think I’m approaching this conversation in a different way to you. I’m trying to point out an approach to analyzing ASI rather than doing the actual analysis, which would take a lot more effort and require me to grapple with this question.
So have I. It is probable that you know more than I do about AIXI right now.
I don’t know how simple string theory actually is, and the bridging laws seem like they’d be even more complex than QFT+GR so I kind of didn’t consider it. But yeah, AIXI would.
So I am unsure if AIXI is the right thing to be approximating. And I’m also unsure if AIXI is a fruitful thing to be approximating. But approximating a thing like AIXI, and other mathematical or physical to rationality, seems like the right approach to analyze an ASI. At least, for estimating the things it can’t do. If I had far more time and energy, I would estimate how much data a perfect reasoner would need to figure out the laws of the universe by collecting all of our major theories and estimating their Kolmogorov complexity, their levin complexity etc. Then I’d try and make guesses as to how much incompressible data there is in e.g. a video of a falling apple. Maybe I’d look at whether that data has any bearing on the bridging laws we think exist. After that, I’d look at various approximations of ideal reasoners, whether they’re physically feasible, how various assumptions like e.g. P=NP might affect things and so on.
That’s what I think the right approach to examining what an ASI can do in this particular case looks like. As compared to what the OP did, which I think is misguided. I’ve been trying to point at that approach in this thread, rather than actually do it. Because that would take too much effort to be worth it. I’d have to got over the literature for computably feasible AIXI variants and all sorts of other stuff.
Isn’t it the same as “assume that it can do argmax as fast as needed for this scenario”?
Could you clarify? I think you mean that it is feasible for the ASI to perform the Bayesian inference it needs, which yeah, sure. EDIT: I mean the least costly approximation of Bayesian inference it needs to figure this stuff out.
I mean are there reasons to assume that a variant of computable AIXI (or its variants) can be realized as a physically feasible device? I can’t find papers indicating significant progress in making feasible AIXI approximations.