Zack_M_Davis comments on The Big Picture Of Alignment (Talk Part 1)

Zack_M_Davis 21 Feb 2022 18:30 UTC
LW: 15 AF: 5
AF
I second Rob’s unanswered question at 40:12: how is that we ever accomplish anything in practice, if the search space is vast, and things that both work and look like they work are exponentially rare?

How is the “the genome is small, therefore generators of human values (that can’t be learned from the environment) are no more complex than tens or hundreds of things on the order of a fuzzy face detector” argument compatible with the complexity of value thesis, or does it contradict it?
- johnswentworth 21 Feb 2022 20:33 UTC
  LW: 12 AF: 2
  AF Parent
  how is that we ever accomplish anything in practice, if the search space is vast, and things that both work and look like they work are exponentially rare?
  This question needs a whole essay (or several) on its own. If I don’t get around to leaving a longer answer in the next few days, ping me.
  Meanwhile, if you want to think it through for yourself, the general question is: where the hell do humans get all their bits-of-search from?
  How is the “the genome is small, therefore generators of human values (that can’t be learned from the environment) are no more complex than tens or hundreds of things on the order of a fuzzy face detector” argument compatible with the complexity of value thesis, or does it contradict it?
  The key difference is between “human values” vs “generators of human values”. The complexity of value thesis (as articulated on that arbital page) says that human values are not algorithmically simple, and I do agree with that. But that still allows for simple generators of human values, which (conceptually) take in lots of data from the real world and spit out values. Everything except those generators is learned from the environment.
  In principle, if we can figure out those relatively-simple generators, then we can feed an AI data similar to the data from which humans’ value-generators generate their values, and the AI should be able to reconstruct human values (up to within ordinary between-humans-within-similar-environments variation).
  - Logan Riggs 22 Feb 2022 20:58 UTC
    LW: 3 AF: 3
    AF Parent
    Meanwhile, if you want to think it through for yourself, the general question is: where the hell do humans get all their bits-of-search from?
    Cultural accumulation and google, but that’s mimicking someone who’s already figured it out. How about the person who first figured out eg crop growth? Could be scientific method, but also just random luck which then caught on.
    
    Additionally, sometimes it’s just applying the same hammers to different nails or finding new nails, which means that there are general patterns (hammers) that can be applied to many different situations. There’s bits of information in both the patterns themselves and when to apply them, though I feel confused trying to connect these ideas here.
    People specifically have inner simulations (ie you can imagine what it’d look like to drop a bowling ball off a building even if you’ve never seen it) from things you have lots of experience with is a way of applying different patterns to new situations.
  - Jack O'Brien 23 Aug 2022 2:15 UTC
    2 points
    Parent
    Did you get around to writing a longer answer to the question, “How do humans do anything in practice if the search space is vast?” I’d be curious to see your thoughts.
    
    My answer to this question is that:
    (a) Most day-to-day problems can be solved from far away using a low-dimensional space containing natural abstractions. For example, a manager at a company can give their team verbal instructions without describing the detailed sequence of muscle movements needed.
    (b) For unsolved problems in science, we get many tries at the problem. So, we can use the scientific method to design many experiments which give us enough bits to locate the solution. For example, a drug discovery team can try thousands of compounds in their search for a new drug. The drug discovery team gets to test each compound on the condition they’re trying to treat—so, they can get many bits about which compounds could be effective.
- Jacy Reese Anthis 6 Jun 2022 0:11 UTC
  3 points
  Parent
  Very interesting! Zack’s two questions were also the top two questions that came to mind for me. I’m not sure if you got around to writing this up in more detail, John, but I’ll jot down the way I tentatively view this differently. Of course I’ve given this vastly less thought than you have, so many grains of salt.
  
  On “If this is so hard, how do humans and other agents arguably do it so easily all the time?”, how meaningful is the notion of extra parameters if most agents are able to find uses for any parameters, even just through redundancy or error-correction (e.g., in case one base pair changes through exaptation or useless mutation)? In alignment, why assume that all aligned AIs “look like they work”? Why assume that these are binaries? Etc. In general, there seem to be many realistic additions to your model that mitigate this exponential-increase-in-possibilities challenge and seem to more closely fit real-world agents who are successful. I don’t see as many such additions that would make the optimization even more challenging.
  
  On generators, why should we carve such a clear and small circle around genes as the generators? Rob mentioned the common thought experiment of alien worlds in which genes produce babies who grow up in isolation from human civilization, and I would push on that further. Even on Earth, we have Stone Age values versus modern values, and if you draw the line more widely (either by calling more things generators or including non-generators), this notion of “generators of human values” starts to seem very narrow and much less meaningful for alignment or a general understanding of agency, which I think most people would say requires learning more values than what is in our genes. I don’t think “feed an AI data” gets around this: AIs already have easy access to genes and to humans of all ages. There is an advantage to telling the AI “these are the genes that matter,” but could it really just take those genes or their mapping onto some value space and raise virtual value-children in a useful way? How do they know they aren’t leaving out the important differentiators between Stone Age and modern values, genetic or otherwise? How would they adjudicate between all the variation in values from all of these sources? How could we map them onto trade-offs suitable for coherence conditions? Etc.