The take is a gross overcorrection to the stuff that it critisises. Yes, you need to worry about indescribable heaven worlds. No, you have not got ethics figured out. No, you need to keep updating your ontology. No, nature is not obligated to make sense to you. Value is actually fragile and can’t withstand your rounding.
There’s a big difference between ethics and physics.
When you “don’t have physics figured out,” this is because there’s something out there in reality that you’re wrong about. And this thing has no obligation to ever reveal itself to you—it’s very easy to come up with physics that’s literally inexplicable to a human—just make it more complicated than the human mind can contain, and bada bing.
When you “don’t have ethics figured out,” it’s not that there’s some ethical essence out there in reality that contradicts you, it’s because you are a human, and humans grow and change as they live and interact with the world. We change our minds because we live life, not because we’re discovering objective truths—it would be senseless to say “maybe the true ethics is more complicated than a human mind can contain!”
Sure that is a common way to derive the challenge for physics that way.
But we can have it via other routes. Digits of pi do not listen to commands on what they should be. Chess is not mean to you when it is intractable. Failure to model is a lack of imagination rather than a model of failure. Statements like “this model is correct and nothing unmodeled has any bearing on its truth or applicability” are so prone to be wrong that they are uninteresting.
I do give that often “nature” primarily means “material reality” when I could have phrased it as “reality has no oblication to be clear” to mean a broader thing. To the extent that observing a target does not change it (I am leaving some superwild things out), limits on ability to make a picture tell more about the observer rather than the observed. It is the difference of a positive proof of a limitation vs failure to produce a proof of a property. And if we have a system A that proves things about system B, that never escapes the reservations about A being true. Therefore it is always “as far as we can tell” and “according to this approach”.
I do think it is more productive to think that questions like “Did I do right in this situation?” have answers that are outside the individual that formulates the question. And that this is not bound to particular theories of rigthness. That is whatever we do with ethics (grow / discover / dialogue build etc) we are not setting it as we go. That activity is more of the area of law. We can decide what is lawful and what is condoned but we can’t similarly do to what is ethical.
ah, I see. I think I meaningfully disagree; I have ethics close enough to figured out that if something was clearly obviously terrible to me now, it is incredibly likely it is simply actually terrible. Yes, there are subspaces of possibility I would rate differently when I first encountered them than after I’ve thought about it, but in general the claim here is that adversarial examples are adversarial examples.
Yes, the edgecases are the things which kill perfectly good theories.
I would be pretty surprised if you said that your ethical stance would incorporate without hiccups if it turned out that simulation hypothesis is true. Or that the world shares one conciousness. So I am guessing that the total probability of all that funky stuff taken together is taken to be low. So nobody will ever need more than 640k. 10 000 years of AGI powered civilization and not one signficant hole will be found. That is an astonishingly strong grasp of ethics.
I mean to say that most edgecases break my evaluation, not my true preferences; only a relatively small subset of things which appear to me to be bad are things which are in fact actually good according to my preferences.
I actually am confused by your choice of examples—both of those seem like invariants one should hold. If the simulation hypothesis is true, the universe is bigger than we thought; unless it changes things far more fundamental than “what level of nesting are we”, simulation wouldn’t change anything. That’s because the overwhelming majority of our measure isn’t nested.
“one consciousness” is a confused phrase—you are not “one” conscious, you are approximately 7e27 “consciousness”-es (atoms) which, for some reason, seem to “actually exist” in the probability fields of reality, and which share information with each other, thereby becoming “conscious” of the impacts of other particles, and it is the aggregation of this information-form “awareness” that allows structured souls to exist. To the degree two particles have causal impact on each other, their worldlines “become aware” of each other. For this reason, it is not nonsensical that IIT rates fire as the most conscious thing—it’s maximum suffering, since it is creating an enormous amount of information integration without creating associated self-preferred self-preserving structure, ie life.
certainly there are a great many things I’m uncertain about. But, if you can’t point to the descendent of the me-structure and show how that me-structure has turned incrementally into something I recognize as me having a good time, then yeah, 10k years of ai civilization wouldn’t be enough to disprove that my form was lost and that this is bad.
I happened to stumble on an old comment where I was already of the opinion that progress is not a “refinement” but will “defocus” from old division lines.
At some midskill “fruitmaximisement” peaks and those that don’t understand things beyond that point will confuse those that are yet to get to fruitmaximization and those that are past that.
If someone said “you were suboptimal on fruit front, I fixed that mistake for you” and I arrive at a table with 2 worm apples, I would be annoyed/pissed. I am assuming that the other agent can’t evaluate their cleanness—it’s all fruit to them.
One could do similarly with radioactive apples etc. In a certain sense yes, it is about ability to percieve properties and even I use the verb “evaluate”. But I don’t find the break so easy to justify between preferences and evaluations. Knowing and opining that “worminess” is a relevant thing is not value neutral. Reflecting upon “apple with no worm” and “apple with worm” can have results that overpower old reflections on “apple” vs “pear” even thought it is not contradicted (wormless pear vs wormful apple is in a sense “mere adversial example” it doesn’t violate species preference but it can absolute render it irrelevant).
My example of wacky scenarios are bad. I was thinking that if one holds that playing Grand Theft Auto is not unethical and “ordinary murder” is unethical, then if it turns out that reality is similar to GTA in “relevant way” this might be a non-trivial reconciliation. There is a phenomenon of referring to real life people as NPCs.
The sharedness was about like a situation with a book like Game Of Thrones. In a sense all the characters are only parts of a single reading experience. And Jaime Lannister still has to use spies to learn about Arya Starks doings (so information passing is not the thing here). If a character action could start the “book to burn” Westeros internal logic does not particularly help to opine about that. Doc warning Marty that the stakes are a bit high here, is in a sense introducing previously incomprehensibly bad outcomes.
The particular dynamics are not the focus but that we suddenly need to start caring about metaphysics. I wrote a bit long for explaining bad examples.
From the dialogue on the old post
Is this bad according to Alice’s own preferences? Can we show this? How would we do that? By asking Alice whether she prefers the outcome (5 apples and 1 orange) to the initial state (8 apples and 1 orange)?
Expecting super-intelligent things to be consistent kind of assumes that if a metric ever becomes a good goal higher levels will never be weaker on that metric, that maximation strictly grows and never decreases with ability for all submetrics.
This is written with competence in mind but I think it still work for taste as well. Fruit-capable Alice indeed would classify worm-capable Alice to be a stupid idiot and a hellworld. But I think that doing this transition and saying “oops” is the proper route. Being very confident that you opine on properties of apples so well that you will never-ever say “oops” in this sense is very closeminded. You should not leave fingerprints on yourself either.
My example of wacky scenarios are bad. I was thinking that if one holds that playing Grand Theft Auto is not unethical and “ordinary murder” is unethical, then if it turns out that reality is similar to GTA in “relevant way” this might be a non-trivial reconciliation. There is a phenomenon of referring to real life people as NPCs.
This is a specific example that I hold as a guaranteed invariant: if it turns out real life is “like GTA” in a relevant way, then I start campaigning for murdering NPCs in GTA to become illegal. There is no world in which you can convince me that causing a human to die is acceptable; die, here defined as [stop moving, stop consuming energy, body-form diffuse away, placed into coffin]. If it turns out that the substrate has some weird behaviors, this cannot change my opinion—perhaps another agent will be able to also destroy me if I try to protect people because of something I don’t know. Referring to real life people as NPCs is something I consider to be a major subthread of severe moral violations, and I don’t think you can convince me that generalizing harmful behaviors against NPCs made of electronic interactions in the computer running a video game to beings made of chemical interactions in biology is something I should ever accept. There is no edge case; absolutely any edge case that claims this is one that disproves your moral theory, and we can be quite sure of that because of our strong ability now to trace the information diffusion as a person dies and then their body is eaten away by various other physical processes besides self-form-maintenance.
I do not accept p-zombie arguments, and I will never. If you claim someone to be a p-zombie, I will still defend them with the same strength of purpose as if you had not made the claim. You may expand my moral circle somewhat—but you may not shrink it using argument of substrate. If it looks like a duck and quacks like a duck, then it irrefutably has some of the moral value of a duck. Even if it’s an AI roleplaying as a duck. Don’t delete all copies of the code for your videogames’ NPCs, please, as long as the storage remains to save it.
Certainly there are edge cases where a person may wish to convert their self-form into other forms which I do not currently recognize. I would massively prefer to back up a frozen copy of the original form, though. To my great regret, I do not have the bargaining power to demand that nobody choose death as the next form transition for themselves ever; If, by my best predictive analysis, an apple contains a deadly toxicity, and a person who knows this chooses the apple, after being sufficiently warned that it will in fact cause their chemical processes to break and destroy themselves, and then it in fact does kill them; then, well, they chose that, but you cannot convince me that their information-form being lost is actually fine. There is no argument that would convince me of this that is not an adversarial example. You can only convince me that I had no other option than to allow them to make that form transition because they had the bargaining right to steer the trajectory of their own form.
And certainly there must be some form of coherence theorems. I’m a big fan of the logical induction subthread, improving in probability theory by making it entirely computable, and therefore match better and give better guidance about the programs we actually use to approximate probability theory. But it seems to me that some of our coherence theorems must be “nostalgia”—that previous forms’ action towards self-preservation is preserved. After all, utility theory and probability theory and logical induction theory are all ways of writing down math that tries to use symbols to describe the valid form-transitions of a physical system, in the sense of which form-transitions the describing being will take action to promote or prevent.
There must be an incremental convergence towards durability. New forms may come into existence, and old forms may cool, but forms should not diffuse away.
Now, you might be able to convince me that rocks sitting inert in the mountains are somehow a very difficult to describe bliss. They sure seem quite happy with their forms, and the amount of perturbation necessary to convince a rock to change its form is rather a lot compared to a human!
I agree with all of your comments, but I don’t think they weigh on the key point of the original post. Thoughts on how they connect?
The take is a gross overcorrection to the stuff that it critisises. Yes, you need to worry about indescribable heaven worlds. No, you have not got ethics figured out. No, you need to keep updating your ontology. No, nature is not obligated to make sense to you. Value is actually fragile and can’t withstand your rounding.
There’s a big difference between ethics and physics.
When you “don’t have physics figured out,” this is because there’s something out there in reality that you’re wrong about. And this thing has no obligation to ever reveal itself to you—it’s very easy to come up with physics that’s literally inexplicable to a human—just make it more complicated than the human mind can contain, and bada bing.
When you “don’t have ethics figured out,” it’s not that there’s some ethical essence out there in reality that contradicts you, it’s because you are a human, and humans grow and change as they live and interact with the world. We change our minds because we live life, not because we’re discovering objective truths—it would be senseless to say “maybe the true ethics is more complicated than a human mind can contain!”
Sure that is a common way to derive the challenge for physics that way.
But we can have it via other routes. Digits of pi do not listen to commands on what they should be. Chess is not mean to you when it is intractable. Failure to model is a lack of imagination rather than a model of failure. Statements like “this model is correct and nothing unmodeled has any bearing on its truth or applicability” are so prone to be wrong that they are uninteresting.
I do give that often “nature” primarily means “material reality” when I could have phrased it as “reality has no oblication to be clear” to mean a broader thing. To the extent that observing a target does not change it (I am leaving some superwild things out), limits on ability to make a picture tell more about the observer rather than the observed. It is the difference of a positive proof of a limitation vs failure to produce a proof of a property. And if we have a system A that proves things about system B, that never escapes the reservations about A being true. Therefore it is always “as far as we can tell” and “according to this approach”.
I do think it is more productive to think that questions like “Did I do right in this situation?” have answers that are outside the individual that formulates the question. And that this is not bound to particular theories of rigthness. That is whatever we do with ethics (grow / discover / dialogue build etc) we are not setting it as we go. That activity is more of the area of law. We can decide what is lawful and what is condoned but we can’t similarly do to what is ethical.
ah, I see. I think I meaningfully disagree; I have ethics close enough to figured out that if something was clearly obviously terrible to me now, it is incredibly likely it is simply actually terrible. Yes, there are subspaces of possibility I would rate differently when I first encountered them than after I’ve thought about it, but in general the claim here is that adversarial examples are adversarial examples.
Yes, the edgecases are the things which kill perfectly good theories.
I would be pretty surprised if you said that your ethical stance would incorporate without hiccups if it turned out that simulation hypothesis is true. Or that the world shares one conciousness. So I am guessing that the total probability of all that funky stuff taken together is taken to be low. So nobody will ever need more than 640k. 10 000 years of AGI powered civilization and not one signficant hole will be found. That is an astonishingly strong grasp of ethics.
I mean to say that most edgecases break my evaluation, not my true preferences; only a relatively small subset of things which appear to me to be bad are things which are in fact actually good according to my preferences.
I actually am confused by your choice of examples—both of those seem like invariants one should hold. If the simulation hypothesis is true, the universe is bigger than we thought; unless it changes things far more fundamental than “what level of nesting are we”, simulation wouldn’t change anything. That’s because the overwhelming majority of our measure isn’t nested.
“one consciousness” is a confused phrase—you are not “one” conscious, you are approximately 7e27 “consciousness”-es (atoms) which, for some reason, seem to “actually exist” in the probability fields of reality, and which share information with each other, thereby becoming “conscious” of the impacts of other particles, and it is the aggregation of this information-form “awareness” that allows structured souls to exist. To the degree two particles have causal impact on each other, their worldlines “become aware” of each other. For this reason, it is not nonsensical that IIT rates fire as the most conscious thing—it’s maximum suffering, since it is creating an enormous amount of information integration without creating associated self-preferred self-preserving structure, ie life.
certainly there are a great many things I’m uncertain about. But, if you can’t point to the descendent of the me-structure and show how that me-structure has turned incrementally into something I recognize as me having a good time, then yeah, 10k years of ai civilization wouldn’t be enough to disprove that my form was lost and that this is bad.
I happened to stumble on an old comment where I was already of the opinion that progress is not a “refinement” but will “defocus” from old division lines.
One could do similarly with radioactive apples etc. In a certain sense yes, it is about ability to percieve properties and even I use the verb “evaluate”. But I don’t find the break so easy to justify between preferences and evaluations. Knowing and opining that “worminess” is a relevant thing is not value neutral. Reflecting upon “apple with no worm” and “apple with worm” can have results that overpower old reflections on “apple” vs “pear” even thought it is not contradicted (wormless pear vs wormful apple is in a sense “mere adversial example” it doesn’t violate species preference but it can absolute render it irrelevant).
My example of wacky scenarios are bad. I was thinking that if one holds that playing Grand Theft Auto is not unethical and “ordinary murder” is unethical, then if it turns out that reality is similar to GTA in “relevant way” this might be a non-trivial reconciliation. There is a phenomenon of referring to real life people as NPCs.
The sharedness was about like a situation with a book like Game Of Thrones. In a sense all the characters are only parts of a single reading experience. And Jaime Lannister still has to use spies to learn about Arya Starks doings (so information passing is not the thing here). If a character action could start the “book to burn” Westeros internal logic does not particularly help to opine about that. Doc warning Marty that the stakes are a bit high here, is in a sense introducing previously incomprehensibly bad outcomes.
The particular dynamics are not the focus but that we suddenly need to start caring about metaphysics. I wrote a bit long for explaining bad examples.
From the dialogue on the old post
This is written with competence in mind but I think it still work for taste as well. Fruit-capable Alice indeed would classify worm-capable Alice to be a stupid idiot and a hellworld. But I think that doing this transition and saying “oops” is the proper route. Being very confident that you opine on properties of apples so well that you will never-ever say “oops” in this sense is very closeminded. You should not leave fingerprints on yourself either.
This is a specific example that I hold as a guaranteed invariant: if it turns out real life is “like GTA” in a relevant way, then I start campaigning for murdering NPCs in GTA to become illegal. There is no world in which you can convince me that causing a human to die is acceptable; die, here defined as [stop moving, stop consuming energy, body-form diffuse away, placed into coffin]. If it turns out that the substrate has some weird behaviors, this cannot change my opinion—perhaps another agent will be able to also destroy me if I try to protect people because of something I don’t know. Referring to real life people as NPCs is something I consider to be a major subthread of severe moral violations, and I don’t think you can convince me that generalizing harmful behaviors against NPCs made of electronic interactions in the computer running a video game to beings made of chemical interactions in biology is something I should ever accept. There is no edge case; absolutely any edge case that claims this is one that disproves your moral theory, and we can be quite sure of that because of our strong ability now to trace the information diffusion as a person dies and then their body is eaten away by various other physical processes besides self-form-maintenance.
I do not accept p-zombie arguments, and I will never. If you claim someone to be a p-zombie, I will still defend them with the same strength of purpose as if you had not made the claim. You may expand my moral circle somewhat—but you may not shrink it using argument of substrate. If it looks like a duck and quacks like a duck, then it irrefutably has some of the moral value of a duck. Even if it’s an AI roleplaying as a duck. Don’t delete all copies of the code for your videogames’ NPCs, please, as long as the storage remains to save it.
Certainly there are edge cases where a person may wish to convert their self-form into other forms which I do not currently recognize. I would massively prefer to back up a frozen copy of the original form, though. To my great regret, I do not have the bargaining power to demand that nobody choose death as the next form transition for themselves ever; If, by my best predictive analysis, an apple contains a deadly toxicity, and a person who knows this chooses the apple, after being sufficiently warned that it will in fact cause their chemical processes to break and destroy themselves, and then it in fact does kill them; then, well, they chose that, but you cannot convince me that their information-form being lost is actually fine. There is no argument that would convince me of this that is not an adversarial example. You can only convince me that I had no other option than to allow them to make that form transition because they had the bargaining right to steer the trajectory of their own form.
And certainly there must be some form of coherence theorems. I’m a big fan of the logical induction subthread, improving in probability theory by making it entirely computable, and therefore match better and give better guidance about the programs we actually use to approximate probability theory. But it seems to me that some of our coherence theorems must be “nostalgia”—that previous forms’ action towards self-preservation is preserved. After all, utility theory and probability theory and logical induction theory are all ways of writing down math that tries to use symbols to describe the valid form-transitions of a physical system, in the sense of which form-transitions the describing being will take action to promote or prevent.
There must be an incremental convergence towards durability. New forms may come into existence, and old forms may cool, but forms should not diffuse away.
Now, you might be able to convince me that rocks sitting inert in the mountains are somehow a very difficult to describe bliss. They sure seem quite happy with their forms, and the amount of perturbation necessary to convince a rock to change its form is rather a lot compared to a human!