Richard_Ngo comments on A simple case for extreme inner misalignment

Richard_Ngo Jul 13, 2024, 10:49 PM
3 points
−4
I think the claim is along the lines of “highly compressed representations imply simple goals”, but the connection between compressed representations and simple goals has not been argued, unless I missed it. There’s also a chance that I simply misunderstand your post entirely.
Hmm, maybe I should spell it out more explicitly. But basically, by “simple goals” I mean “goals which are simple to represent”, i.e. goals which have highly compressed representations; and if all representations are becoming simpler, then the goal representations (as a subset of all representations) are also becoming simpler. (Note that I’ll elaborate further on the relationship between goal representations and other representations in my next post.)
Actually, deep CNNs are an example of what you describe in argument 1: The features in later layers of CNNs are highly compressed, and may tell you binary information such as “is there a dog”, but they apply to large parts of the input image.
This is largely my fault since I haven’t really defined “representation” very clearly, but I would say that the representation of the concept of a dog should be considered to include e.g. the neurons representing “fur”, “mouth”, “nose”, “barks”, etc. Otherwise if we just count “dog” as being encoded in a single neuron, then every concept encoded in any neuron is equally simple, which doesn’t seem like a useful definition.
(To put it another way: the representation is the information you need to actually do stuff with the concept.)
c. I think this leaves the confusion why philosophers end up favoring the analog of squiggles when they become hedonic utilitarians. I’d argue that the premise may be false since it’s unclear to me how what philosophers say they care about (“henonium”) connects with what they actually care about (e.g., maybe they still listen to complex music, build a family, build status through philosophical argumentation, etc.)
I agree that most people who say they are hedonic utilitarians are not 100% committed to hedonic utilitarianism. But I still think it’s very striking that they at least somewhat care about making hedonium. I claim this provides an intuition pump for how AIs might care about squiggles too.
- Leon Lang Jul 14, 2024, 10:53 AM
  3 points
  0
  Parent
  Thanks for the answer!
  But basically, by “simple goals” I mean “goals which are simple to represent”, i.e. goals which have highly compressed representations
  It seems to me you are using “compressed” in two very different meanings in part 1 and 2. Or, to be fairer, I interpret the meanings very differently.
  I try to make my view of things more concrete to explain:
  Compressed representations: A representation is a function $f : O \to R$ from observations of the world state $O$ (or sequences of such observations) into a representation space $R$ of “features”. That this is “compressed” means (a) that in $R$ , only a small number of features are active at any given time and (b) that this small number of features is still sufficient to predict/act in the world.
  Goals building on compressed representations: A goal is a (maybe linear) function $U : R \to R$ from the representation space into the real numbers. The goal “likes” some features and “dislikes” others. (Or if it is not entirely linear, then it may like/dislike some simple combinations/compositions of features)
  It seems to me that in part 2 of your post, you view goals as compositions $U \circ f : O \to R$ . Part 1 says that $f$ is highly compressed. But it’s totally unclear to me why the composition $U \circ f$ should then have the simplicity properties you claim in part 2, which in my mind don’t connect with the compression properties of $f$ as I just defined them.
  A few more thoughts:
  - The notion of “simplicity” in part $2$ seems to be about how easy it is to represent a function—i.e., the space of parameters with which the function $U \circ f$ is represented is simple in your part 2.
  - The notion of “compression” in part 1 seems to be about how easy it is to represent an input—i.e., is there a small number of features such that their activation tells you the important things about the input?
  - These notions of simplicity and compression are very different. Indeed, if you have a highly compressed representation $f$ as in part 1, I’d guess that $f$ necessarily lives in a highly complex space of possible functions with many parameters, thus the opposite of what seems to be going on in part 2.
  This is largely my fault since I haven’t really defined “representation” very clearly, but I would say that the representation of the concept of a dog should be considered to include e.g. the neurons representing “fur”, “mouth”, “nose”, “barks”, etc. Otherwise if we just count “dog” as being encoded in a single neuron, then every concept encoded in any neuron is equally simple, which doesn’t seem like a useful definition.
  (To put it another way: the representation is the information you need to actually do stuff with the concept.)
  I’m confused. Most of the time, when seeing a dog, most of what I need is actually just to know that it is a “dog”, so this is totally sufficient to do something with the concept. E.g., if I walk on the street and wonder “will this thing bark?”, then knowing “my dog neuron activates” is almost enough.
  I’m confused for a second reason: It seems like here you want to claim that the “dog” representation is NOT simple (since it contains “fur”, “mouth”, etc.). However, the “dog” representation needs lots of intelligence and should thus come along with compression, and if you equate compression and simplicity, then it seems to me like you’re not consistent. (I feel a bit awkward saying “you’re not consistent”, but I think it’s probably good if I state my honest state of mind at this moment).
  To clarify my own position, in line with my definition of compression further above: I think that whether representation is simple/compressed is NOT a property of a single input-output relation (like “pixels of dog gets mapped to dog-neuron being activated”), but instead a property of the whole FUNCTION that maps inputs to representations. This function is compressed if for any given input, only a small number of neurons in the last layer activate, and if these can be used (ideally in a linear way) for further predictions and for evaluating goal-achievement.
  I agree that most people who say they are hedonic utilitarians are not 100% committed to hedonic utilitarianism. But I still think it’s very striking that they at least somewhat care about making hedonium. I claim this provides an intuition pump for how AIs might care about squiggles too.
  Okay, I agree with this, fwiw. :) (Though I may not necessarily agree with claims about how this connects to the rest of the post)
- Seth Herd Jul 20, 2024, 9:24 PM
  2 points
  0
  Parent
  Making representations simpler even when that makes them worse at their job is not more intelligent. Yes, on the mammalian scale smarter minds compress more in many ways. That doesn’t mean yet smarter minds will keep doing it more even when it makes them worse at achieving their goals, and is not necessary since they have adequate storage to keep the less-compressed and therefore more accurate and useful represenations.
  
  This is a noble project, but I’m afraid the premise is simply false.