Alternatively, the UDT style reasoning keeps getting defected on because it does a bad job of predicting similar agents. This is another part of the original point that has trouble when mixing with complicated reality—the principles that work depend not only on your OWN ontology, but also the ontology of your community. There are stable states that work when most people are under a specific ontology, and ones that work no matter how many people are operating under which ontology, and ones that work in many circumstances but you can certainly construct tournaments with specific types of behaviors where they fail spectacularly.
UDT is one of those that works really well when many other people are also operating under UDT, AND actually have similar source code they can predict each other. However there are many societies/times when that’s not true.
There are stable states that work when most people are under a specific ontology, and ones that work no matter how many people are operating under which ontology,
But part of my point is that if your stable-state only works if everyone is in a particular ontology, this only matters if your stable state includes a mechanism to maintain, or achieve, everyone having that particular ontology. (either by being very persuasive, or obtaining power, or some such)
There exist moral ontologies that I’d describe as self-defeating, because they didn’t have any way of contending with a broader universe.
There exist moral ontologies that I’d describe as self-defeating, because they didn’t have any way of contending with a broader universe.
Agreed 100%. I think the reverse statement though: “There exist ontologies that are both human compatible and can contend with all existing/possible configurations of the universe” is also false.
The central idea behind being a robust agent I think, is how close can we get to this, and I think it’s actually a really interesting and fruitful research direction, and an interesting ontology all on its’ own. However, I tend to be skeptical of its’ usefulness on actual human hardware, at least if “elegance” or “simplicity” is considered as a desirable property of the resulting meta-ontology.
ETA: I expect the resultant meta-ontology for humans to look much more like “based on a bunch of hard to pin down heuristics, this is the set of overlapping ontologies that I’m using for this specific scenario”
There is some fact-of-the-matter about “which ontologies are possible to run on real physics [in this universe] or in hypothetical physics [somewhere off in mathematical Tegmark IV land].”
Sticking to ‘real physics as we understand it’, for now, I think it is possible to grade ontologies on how well they perform in the domains that they care about. (where some ontologies get good scores by not caring about as much, and others get good scores by being robust)
There is some fact of the matter about what the actual laws of physics and game theory are, even if no one can compute them.
Meta-ontologies are still ontologies. I think ontologies that are flexible will (longterm) outcompete ontologies that are not.
There are multiple ways to be flexible, which include:
“I have lots of tools available with some hard to pin down heuristics for which tools to use”
“I want to understand the laws of the universe as deeply as possible, and since I have bounded compute, I want to cache those laws into heuristics that are as simple as possible while cleaving as accurately as possible to the true underlying law, with varying tools specifically to tell me when to zoom into the map.”
I expect that in the next 10-100 years, the first set frame will outcompete the second frame in terms of “number of people using that frame to be reasonably successful.” But in the long run and deep future, I expect the second frame to outcompete the first. I’d *might* expect this whether or not we switch from human hardware to silicon uploads. But I definitely expect it once uploads exist.
Alternatively, the UDT style reasoning keeps getting defected on because it does a bad job of predicting similar agents. This is another part of the original point that has trouble when mixing with complicated reality—the principles that work depend not only on your OWN ontology, but also the ontology of your community. There are stable states that work when most people are under a specific ontology, and ones that work no matter how many people are operating under which ontology, and ones that work in many circumstances but you can certainly construct tournaments with specific types of behaviors where they fail spectacularly.
UDT is one of those that works really well when many other people are also operating under UDT, AND actually have similar source code they can predict each other. However there are many societies/times when that’s not true.
But part of my point is that if your stable-state only works if everyone is in a particular ontology, this only matters if your stable state includes a mechanism to maintain, or achieve, everyone having that particular ontology. (either by being very persuasive, or obtaining power, or some such)
There exist moral ontologies that I’d describe as self-defeating, because they didn’t have any way of contending with a broader universe.
Agreed 100%. I think the reverse statement though: “There exist ontologies that are both human compatible and can contend with all existing/possible configurations of the universe” is also false.
The central idea behind being a robust agent I think, is how close can we get to this, and I think it’s actually a really interesting and fruitful research direction, and an interesting ontology all on its’ own. However, I tend to be skeptical of its’ usefulness on actual human hardware, at least if “elegance” or “simplicity” is considered as a desirable property of the resulting meta-ontology.
ETA: I expect the resultant meta-ontology for humans to look much more like “based on a bunch of hard to pin down heuristics, this is the set of overlapping ontologies that I’m using for this specific scenario”
I have a few different answers for this:
There is some fact-of-the-matter about “which ontologies are possible to run on real physics [in this universe] or in hypothetical physics [somewhere off in mathematical Tegmark IV land].”
Sticking to ‘real physics as we understand it’, for now, I think it is possible to grade ontologies on how well they perform in the domains that they care about. (where some ontologies get good scores by not caring about as much, and others get good scores by being robust)
There is some fact of the matter about what the actual laws of physics and game theory are, even if no one can compute them.
Meta-ontologies are still ontologies. I think ontologies that are flexible will (longterm) outcompete ontologies that are not.
There are multiple ways to be flexible, which include:
“I have lots of tools available with some hard to pin down heuristics for which tools to use”
“I want to understand the laws of the universe as deeply as possible, and since I have bounded compute, I want to cache those laws into heuristics that are as simple as possible while cleaving as accurately as possible to the true underlying law, with varying tools specifically to tell me when to zoom into the map.”
I expect that in the next 10-100 years, the first set frame will outcompete the second frame in terms of “number of people using that frame to be reasonably successful.” But in the long run and deep future, I expect the second frame to outcompete the first. I’d *might* expect this whether or not we switch from human hardware to silicon uploads. But I definitely expect it once uploads exist.