I might think in terms of causal decision theory, or updateless, or functional. But… I think for most people these are more like descriptors of their True Ontology than the actual driver. If I’m currently doing causal decision theory and noticing that other people are making more money from Omega than me I can stop and think “hmm, this seems sorta stupid.”
I don’t think omega actually exists though? Like, instead what I’d imagine to see is some people cooperating in the prisoners dillema, and some people defecting, and each group going “yes, this the best outcome giving my ontology”.
But, the people who are defecting eventually notice that they are getting outcompeted by people who cooperated.
There are some consistent worldviews where the thing to do is “stick to your principles” even if that strategy will eventually eradicate itself. (i.e. it’s always good to defect on people who want the wrong things, or things like Shakers deciding not to have children). Nonetheless, there is a fact of the matter about which sorts of principles can survive iterated game theory and which can’t.
I claim most people who defect unreflectively because of CDT-type reasons are just making a mistake (relative to their own goal structure), rather than actually getting a good outcome given their ontology.
Alternatively, the UDT style reasoning keeps getting defected on because it does a bad job of predicting similar agents. This is another part of the original point that has trouble when mixing with complicated reality—the principles that work depend not only on your OWN ontology, but also the ontology of your community. There are stable states that work when most people are under a specific ontology, and ones that work no matter how many people are operating under which ontology, and ones that work in many circumstances but you can certainly construct tournaments with specific types of behaviors where they fail spectacularly.
UDT is one of those that works really well when many other people are also operating under UDT, AND actually have similar source code they can predict each other. However there are many societies/times when that’s not true.
There are stable states that work when most people are under a specific ontology, and ones that work no matter how many people are operating under which ontology,
But part of my point is that if your stable-state only works if everyone is in a particular ontology, this only matters if your stable state includes a mechanism to maintain, or achieve, everyone having that particular ontology. (either by being very persuasive, or obtaining power, or some such)
There exist moral ontologies that I’d describe as self-defeating, because they didn’t have any way of contending with a broader universe.
There exist moral ontologies that I’d describe as self-defeating, because they didn’t have any way of contending with a broader universe.
Agreed 100%. I think the reverse statement though: “There exist ontologies that are both human compatible and can contend with all existing/possible configurations of the universe” is also false.
The central idea behind being a robust agent I think, is how close can we get to this, and I think it’s actually a really interesting and fruitful research direction, and an interesting ontology all on its’ own. However, I tend to be skeptical of its’ usefulness on actual human hardware, at least if “elegance” or “simplicity” is considered as a desirable property of the resulting meta-ontology.
ETA: I expect the resultant meta-ontology for humans to look much more like “based on a bunch of hard to pin down heuristics, this is the set of overlapping ontologies that I’m using for this specific scenario”
There is some fact-of-the-matter about “which ontologies are possible to run on real physics [in this universe] or in hypothetical physics [somewhere off in mathematical Tegmark IV land].”
Sticking to ‘real physics as we understand it’, for now, I think it is possible to grade ontologies on how well they perform in the domains that they care about. (where some ontologies get good scores by not caring about as much, and others get good scores by being robust)
There is some fact of the matter about what the actual laws of physics and game theory are, even if no one can compute them.
Meta-ontologies are still ontologies. I think ontologies that are flexible will (longterm) outcompete ontologies that are not.
There are multiple ways to be flexible, which include:
“I have lots of tools available with some hard to pin down heuristics for which tools to use”
“I want to understand the laws of the universe as deeply as possible, and since I have bounded compute, I want to cache those laws into heuristics that are as simple as possible while cleaving as accurately as possible to the true underlying law, with varying tools specifically to tell me when to zoom into the map.”
I expect that in the next 10-100 years, the first set frame will outcompete the second frame in terms of “number of people using that frame to be reasonably successful.” But in the long run and deep future, I expect the second frame to outcompete the first. I’d *might* expect this whether or not we switch from human hardware to silicon uploads. But I definitely expect it once uploads exist.
I don’t think omega actually exists though? Like, instead what I’d imagine to see is some people cooperating in the prisoners dillema, and some people defecting, and each group going “yes, this the best outcome giving my ontology”.
But, the people who are defecting eventually notice that they are getting outcompeted by people who cooperated.
There are some consistent worldviews where the thing to do is “stick to your principles” even if that strategy will eventually eradicate itself. (i.e. it’s always good to defect on people who want the wrong things, or things like Shakers deciding not to have children). Nonetheless, there is a fact of the matter about which sorts of principles can survive iterated game theory and which can’t.
I claim most people who defect unreflectively because of CDT-type reasons are just making a mistake (relative to their own goal structure), rather than actually getting a good outcome given their ontology.
Alternatively, the UDT style reasoning keeps getting defected on because it does a bad job of predicting similar agents. This is another part of the original point that has trouble when mixing with complicated reality—the principles that work depend not only on your OWN ontology, but also the ontology of your community. There are stable states that work when most people are under a specific ontology, and ones that work no matter how many people are operating under which ontology, and ones that work in many circumstances but you can certainly construct tournaments with specific types of behaviors where they fail spectacularly.
UDT is one of those that works really well when many other people are also operating under UDT, AND actually have similar source code they can predict each other. However there are many societies/times when that’s not true.
But part of my point is that if your stable-state only works if everyone is in a particular ontology, this only matters if your stable state includes a mechanism to maintain, or achieve, everyone having that particular ontology. (either by being very persuasive, or obtaining power, or some such)
There exist moral ontologies that I’d describe as self-defeating, because they didn’t have any way of contending with a broader universe.
Agreed 100%. I think the reverse statement though: “There exist ontologies that are both human compatible and can contend with all existing/possible configurations of the universe” is also false.
The central idea behind being a robust agent I think, is how close can we get to this, and I think it’s actually a really interesting and fruitful research direction, and an interesting ontology all on its’ own. However, I tend to be skeptical of its’ usefulness on actual human hardware, at least if “elegance” or “simplicity” is considered as a desirable property of the resulting meta-ontology.
ETA: I expect the resultant meta-ontology for humans to look much more like “based on a bunch of hard to pin down heuristics, this is the set of overlapping ontologies that I’m using for this specific scenario”
I have a few different answers for this:
There is some fact-of-the-matter about “which ontologies are possible to run on real physics [in this universe] or in hypothetical physics [somewhere off in mathematical Tegmark IV land].”
Sticking to ‘real physics as we understand it’, for now, I think it is possible to grade ontologies on how well they perform in the domains that they care about. (where some ontologies get good scores by not caring about as much, and others get good scores by being robust)
There is some fact of the matter about what the actual laws of physics and game theory are, even if no one can compute them.
Meta-ontologies are still ontologies. I think ontologies that are flexible will (longterm) outcompete ontologies that are not.
There are multiple ways to be flexible, which include:
“I have lots of tools available with some hard to pin down heuristics for which tools to use”
“I want to understand the laws of the universe as deeply as possible, and since I have bounded compute, I want to cache those laws into heuristics that are as simple as possible while cleaving as accurately as possible to the true underlying law, with varying tools specifically to tell me when to zoom into the map.”
I expect that in the next 10-100 years, the first set frame will outcompete the second frame in terms of “number of people using that frame to be reasonably successful.” But in the long run and deep future, I expect the second frame to outcompete the first. I’d *might* expect this whether or not we switch from human hardware to silicon uploads. But I definitely expect it once uploads exist.