MSRayne comments on The Preference Fulfillment Hypothesis

MSRayne Feb 26, 2023, 8:49 PM
2 points
−4
I disagree very strongly with the voice. Those complicated things are only complicated because we don’t introspect about them hard enough, not for any intrinsic reasons. I also think most people just don’t have enough self-awareness to be able to perceive their thoughts forming and get a sense of the underlying logic. I’m not saying I can do it perfectly, but I can do it much better than the average person. Consider all the psychological wisdom of Buddhism, which came from people without modern science just paying really close attention to their own minds for a long time.
- rvnnt Feb 27, 2023, 1:19 PM
  2 points
  1
  Parent
  Interesting!
  
  Those complicated things are only complicated because we don’t introspect about them hard enough, not for any intrinsic reasons.
  
  My impression is that the human brain is in fact intrinsically quite complex!^[1]
  
  I also think most people just don’t have enough self-awareness to be able to perceive their thoughts forming and get a sense of the underlying logic.
  
  I think {most people’s introspective abilities} are irrelevant. (But FWIW, given that lots of people seem to e.g. conflate a verbal stream with thought, I agree that median human introspective abilities are probably kinda terrible.)
  
  Consider all the psychological wisdom of Buddhism [...]
  
  Unfortunately I’m not familiar with the wisdom of Buddhism; so that doesn’t provide me with much evidence either way :-/
  
  An obvious way to test how complex a thing X really is, or how well one understands it, is to (attempt to) implement it as code or math. If the resulting software is not very long, and actually captures all the relevant aspects of X, then indeed X is not very complex.
  
  Are you able to write software that implements (e.g.) kindness, prosociality, or “an entity that cares intrinsically about other entities”^[2]? Or write an informal sketch of such math/code? If yes, I’d be very curious to see it! ^[3]
  ↩︎
  Like, even if only ~1% of the information in the human genome is about how to wire the human brain, that’d still be ~10 MB worth of info/code. And that’s just the code for how to learn from vast amounts of sensory data; an adult human brain would contain vastly more structure/information than that 10 MB. I’m not sure how to estimate how much, but given the vast amount “training data” and “training time” that goes into a human child, I wouldn’t be surprised if it were in the ballpark of hundreds of terabytes. If even 0.01% of that info is about kindness/prosociality/etc., then we’re still talking of something like 10 GB worth of information. This (and other reasoning) leads me to feel moderately sure that things like “kindness” are in fact rather complex.
  
  ↩︎
  ...and hopefully, in addition to “caring about other entities”, also tries to do something like “and implement the other entities’ CEV”.
  
  ↩︎
  Please don’t publish anything infohazardous, though, obviously.
  - MSRayne Feb 27, 2023, 1:28 PM
    2 points
    0
    Parent
    It would in fact be infohazardous, but yes, I’ve kinda been doing all this introspection for years now with the intent of figuring out how to implement it in an AGI. In particular, I think there’s a nontrivial possibility that GPT-2 by itself is already AGI-complete and just needs to be prompted in the right intricate pattern to produce thoughts in a similar structure to how humans do. I do not have access to a GPU, so I cannot test and develop this, which is very frustrating to me.
    I’m almost certainly wrong about how simple this is, but I need to be able to build and tweak a system actively in order to find out—and in particular, I’m really bad at explaining abstract ideas in my head, as most of them are more visual than verbal.
    One bit that wouldn’t be infohazardous though afaik is the “caring intrinsically about other entities” bit. I’m sure you can see how a sufficiently intelligent language model could be used to predict, given a simulated future scenario, whether a simulated entity experiencing that scenario would prefer, upon being credibly given the choice, for the event to be undone / not have happened in the first place. This is intended to parallel the human ability—indeed, automatic subconscious tendency—to continually predict whether an action we are considering will contradict the preferences of others we care about, and choose not to do it if it will.
    So, a starting point would be to try to make a model which is telling a story, but regularly asks every entity being simulated if they want to undo the most recent generation, and does so if even one of them asks to do it. Would this result in a more ethical sequence of events? That’s one of the things I want to explore.