There’s a court at my university accommodation that people who aren’t Fellows of the college aren’t allowed on, it’s a pretty medium-sized square of mown grass. One of my friends said she was “morally opposed” to this (on biodiversity grounds, if the space wasn’t being used for people it should be used for nature).
And I couldn’t help but think, how tiring it would be to have a moral-feeling-detector this strong. How could one possibly cope with hearing about burglaries, or North Korea, or astronomical waste.
I’ve been aware of scope insensitivity for a long time now but, this just really put things in perspective in a visceral way for me.
For many who talk about “moral opposition”, talk is cheap, and the cause of such a statement may be in-group or virtue signaling rather than an indicator of intensity of moral-feeling-detector.
You haven’t really stated that she’s putting all that much energy into this (implied, I guess), but I’d see nothing wrong with having a moral stance about literally everything but still prioritizing your activity in healthy ways, judging this, maybe even arguing vociferously for it, for about 10 minutes, before getting back to work and never thinking about it again.
To me it seems more likely that this person is misreporting their motive than that they really oppose this allocation of a patch of grass on biodiversity grounds. I would expect grounds like “I want to use it myself” or slightly more general “it should be available for a wider group” to be very much more common, for example if I had to rank likelihood of motives after hearing that someone objects, but before hearing their reasons. I’d end up with more weight on “playing social games” than on “earnestly believes this”.
On the other hand it would not surprise me very much that at least one person somewhere might truly hold this position. Just my weight for any particular person would be very low.
Seems like if you’re working with neural networks there’s not a simple map from an efficient (in terms of program size, working memory, and speed) optimizer which maximizes X to an equivalent optimizer which maximizes -X.
If we consider that an efficient optimizer does something like tree search, then it would be easy to flip the sign of the node-evaluating “prune” module. But the “babble” module is likely to select promising actions based on a big bag of heuristics which aren’t easily flipped. Moreover, flipping a heuristic which upweights a small subset of outputs which lead to X doesn’t lead to a new heuristic which upweights a small subset of outputs which lead to -X.
Generalizing, this means that if you have access to maximizers for X, Y, Z, you can easily construct a maximizer for e.g. 0.3X+0.6Y+0.1Z but it would be non-trivial to construct a maximizer for 0.2X-0.5Y-0.3Z. This might mean that a certain class of mesa-optimizers (those which arise spontaneously as a result of training an AI to predict the behaviour of other optimizers) are likely to lie within a fairly narrow range of utility functions.
True if you don’t count the training process as part of the optimizer (which is a choice that sometimes makes sense and sometimes doesn’t). If you count the training process as part of the optimizer, then you can of course just flip your loss function or RL signal most of the time.
How do you construct a maximizer for 0.3X+0.6Y+0.1Z from three maximizers for X, Y, and Z? It certainly isn’t true in general for black box optimizers, so presumably this is something specific to a certain class of neural networks.
My model: suppose we have a DeepDreamer-style architecture, where (given a history of sensory inputs) the babbler module produces a distribution over actions, a world model predicts subsequent sensory inputs, and an evaluator predicts expected future X. If we run a tree-search over some weighted combination of the X, Y, and Z maximizers’ predicted actions, then run each of the X, Y, and Z maximizers’ evaluators, we’d get a reasonable approximation of a weighted maximizers.
This wouldn’t be true if we gave negative weights to the maximizers, because while the evaluator module would still make sense, the action distributions we’d get would probably be incoherent e.g. the model just running into walls or jumping off cliffs.
My conjecture is that, if a large black box model is doing something like modelling X, Y, and Z maximizers acting in the world, that large black box model might be close in model-space to a itself being a maximizer which maximizes 0.3X + 0.6Y + 0.1Z, but it’s far in model-space from being a maximizer which maximizes 0.3X − 0.6Y − 0.1Z due to the above problem.
Seems like there’s a potential solution to ELK-like problems. If you can force the information to move from the AI’s ontology to (it’s model of) a human’s ontology and then force it to move it back again.
This gets around “basic” deception since we can always compare the AI’s ontology before and after the translation.
The question is how do we force the knowledge to go through the (modeled) human’s ontology, and how do we know the forward and backward translators aren’t behaving badly in some way.
Getting rid of guilt and shame as motivators of people is definitely admirable, but still leaves a moral/social question. Goodness or Badness of a person isn’t just an internal concept for people to judge themselves by, it’s also a handle for social reward or punishment to be doled out.
I wouldn’t want to be friends with Saddam Hussein, or even a deadbeat parent who neglects the things they “should” do for their family. This also seems to be true regardless of whether my social punishment or reward has the ability to change these people’s behaviour. But what about being friends with someone who has a billion dollars but refuses to give any of that to charity? What if they only have a million dollars? What if they have a reasonably comfortable life but not much spare income?
Clearly the current levels of social reward/punishment are off (billionaire philanthropy etc.) so there seems an obvious direction to push social norms in if possible. But this leaves the question of where the norms should end up.
I think there’s a bit of a jump from ‘social norm’ to ‘how our government deals with murders’. Referring to the latter as ‘social’ doesn’t make a lot of sense.
I think I’ve explained myself poorly, I meant to use the phrase social reward/punishment to refer exclusively to things forming friendships and giving people status, which is doled out differently to “physical government punishment”. Saddam Hussein was probably a bad example as he is also someone who would clearly also receive the latter.
The UK has just switched their available rapid Covid tests from a moderately unpleasant one to an almost unbearable one. Lots of places require them for entry. I think the cost/benefit makes sense even with the new kind, but I’m becoming concerned we’ll eventually reach the “imagine a society where everyone hits themselves on the head every day with a baseball bat” situation if cases approach zero.
Just realized I’m probably feeling much worse than I ought to on days when I fast because I’ve not been taking sodium. I really should have checked this sooner. If you’re planning to do long (I do a day, which definitely feels long) fasts, take sodium!
There’s a court at my university accommodation that people who aren’t Fellows of the college aren’t allowed on, it’s a pretty medium-sized square of mown grass. One of my friends said she was “morally opposed” to this (on biodiversity grounds, if the space wasn’t being used for people it should be used for nature).
And I couldn’t help but think, how tiring it would be to have a moral-feeling-detector this strong. How could one possibly cope with hearing about burglaries, or North Korea, or astronomical waste.
I’ve been aware of scope insensitivity for a long time now but, this just really put things in perspective in a visceral way for me.
For many who talk about “moral opposition”, talk is cheap, and the cause of such a statement may be in-group or virtue signaling rather than an indicator of intensity of moral-feeling-detector.
You haven’t really stated that she’s putting all that much energy into this (implied, I guess), but I’d see nothing wrong with having a moral stance about literally everything but still prioritizing your activity in healthy ways, judging this, maybe even arguing vociferously for it, for about 10 minutes, before getting back to work and never thinking about it again.
To me it seems more likely that this person is misreporting their motive than that they really oppose this allocation of a patch of grass on biodiversity grounds. I would expect grounds like “I want to use it myself” or slightly more general “it should be available for a wider group” to be very much more common, for example if I had to rank likelihood of motives after hearing that someone objects, but before hearing their reasons. I’d end up with more weight on “playing social games” than on “earnestly believes this”.
On the other hand it would not surprise me very much that at least one person somewhere might truly hold this position. Just my weight for any particular person would be very low.
Seems like if you’re working with neural networks there’s not a simple map from an efficient (in terms of program size, working memory, and speed) optimizer which maximizes X to an equivalent optimizer which maximizes -X. If we consider that an efficient optimizer does something like tree search, then it would be easy to flip the sign of the node-evaluating “prune” module. But the “babble” module is likely to select promising actions based on a big bag of heuristics which aren’t easily flipped. Moreover, flipping a heuristic which upweights a small subset of outputs which lead to X doesn’t lead to a new heuristic which upweights a small subset of outputs which lead to -X. Generalizing, this means that if you have access to maximizers for X, Y, Z, you can easily construct a maximizer for e.g. 0.3X+0.6Y+0.1Z but it would be non-trivial to construct a maximizer for 0.2X-0.5Y-0.3Z. This might mean that a certain class of mesa-optimizers (those which arise spontaneously as a result of training an AI to predict the behaviour of other optimizers) are likely to lie within a fairly narrow range of utility functions.
True if you don’t count the training process as part of the optimizer (which is a choice that sometimes makes sense and sometimes doesn’t). If you count the training process as part of the optimizer, then you can of course just flip your loss function or RL signal most of the time.
How do you construct a maximizer for 0.3X+0.6Y+0.1Z from three maximizers for X, Y, and Z? It certainly isn’t true in general for black box optimizers, so presumably this is something specific to a certain class of neural networks.
My model: suppose we have a DeepDreamer-style architecture, where (given a history of sensory inputs) the babbler module produces a distribution over actions, a world model predicts subsequent sensory inputs, and an evaluator predicts expected future X. If we run a tree-search over some weighted combination of the X, Y, and Z maximizers’ predicted actions, then run each of the X, Y, and Z maximizers’ evaluators, we’d get a reasonable approximation of a weighted maximizers.
This wouldn’t be true if we gave negative weights to the maximizers, because while the evaluator module would still make sense, the action distributions we’d get would probably be incoherent e.g. the model just running into walls or jumping off cliffs.
My conjecture is that, if a large black box model is doing something like modelling X, Y, and Z maximizers acting in the world, that large black box model might be close in model-space to a itself being a maximizer which maximizes 0.3X + 0.6Y + 0.1Z, but it’s far in model-space from being a maximizer which maximizes 0.3X − 0.6Y − 0.1Z due to the above problem.
Seems like there’s a potential solution to ELK-like problems. If you can force the information to move from the AI’s ontology to (it’s model of) a human’s ontology and then force it to move it back again.
This gets around “basic” deception since we can always compare the AI’s ontology before and after the translation.
The question is how do we force the knowledge to go through the (modeled) human’s ontology, and how do we know the forward and backward translators aren’t behaving badly in some way.
Getting rid of guilt and shame as motivators of people is definitely admirable, but still leaves a moral/social question. Goodness or Badness of a person isn’t just an internal concept for people to judge themselves by, it’s also a handle for social reward or punishment to be doled out.
I wouldn’t want to be friends with Saddam Hussein, or even a deadbeat parent who neglects the things they “should” do for their family. This also seems to be true regardless of whether my social punishment or reward has the ability to change these people’s behaviour. But what about being friends with someone who has a billion dollars but refuses to give any of that to charity? What if they only have a million dollars? What if they have a reasonably comfortable life but not much spare income?
Clearly the current levels of social reward/punishment are off (billionaire philanthropy etc.) so there seems an obvious direction to push social norms in if possible. But this leaves the question of where the norms should end up.
I think there’s a bit of a jump from ‘social norm’ to ‘how our government deals with murders’. Referring to the latter as ‘social’ doesn’t make a lot of sense.
I think I’ve explained myself poorly, I meant to use the phrase social reward/punishment to refer exclusively to things forming friendships and giving people status, which is doled out differently to “physical government punishment”. Saddam Hussein was probably a bad example as he is also someone who would clearly also receive the latter.
The UK has just switched their available rapid Covid tests from a moderately unpleasant one to an almost unbearable one. Lots of places require them for entry. I think the cost/benefit makes sense even with the new kind, but I’m becoming concerned we’ll eventually reach the “imagine a society where everyone hits themselves on the head every day with a baseball bat” situation if cases approach zero.
Just realized I’m probably feeling much worse than I ought to on days when I fast because I’ve not been taking sodium. I really should have checked this sooner. If you’re planning to do long (I do a day, which definitely feels long) fasts, take sodium!