jessicata comments on A case for AI alignment being difficult

jessicata 1 Jan 2024 23:52 UTC
4 points
0
I think it’s possible human values depend on life history too, but that seems to add additional complexity and make alignment harder. If the effects of life history very much dominate those of evolutionary history, then maybe neglecting evolutionary history would be more acceptable, making the problem easier.

But I don’t think default AGI would be especially path dependent on human collective life history. Human society changes over time as humans supersede old cultures (see section on subversion). AGI would be a much bigger shift than the normal societal shifts and so would drift from human culture more rapidly. Partially due to different conceptual ontology and so on. The legacy concepts of humans would be a pretty inefficient system for AGIs to keep using. Like how scientists aren’t alchemists anymore, but a bigger shift than that.

(Note, LLMs still rely a lot on human concepts rather than having independent ontology and agency, so this is more about future AI systems)
- David Johnston 1 Jan 2024 23:57 UTC
  3 points
  0
  Parent
  If people now don’t have strong views about exactly what they want the world to look like in 1000 years but people in 1000 years do have strong views then I think we should defer to future people to evaluate the “human utility” of future states. You seem to be suggesting that we should take the views of people today, although I might be misunderstanding.
  
  Edit: or maybe you’re saying that the AGI trajectory will be ~random from the point of view of the human trajectory due to a different ontology. Maybe, but different ontology → different conclusions is less obvious to me than different data → different conclusions. If there’s almost no mutual information between the different data then the conclusions have to be different, but sometimes you could come to the same conclusions under different ontologies w/data from the same process.
  - jessicata 2 Jan 2024 0:41 UTC
    4 points
    0
    Parent
    To the extent people now don’t care about the long-term future there isn’t much to do in terms of long-term alignment. People right now who care about what happens 2000 years from now probably have roughly similar preferences to people 1000 years from now who aren’t significantly biologically changed or cognitively enhanced, because some component of what people care about is biological.
    
    I’m not saying it would be random so much as not very dependent on the original history of humans used to train early AGI iterations. It would have different data history but part of that is because of different measurements, e.g. scientific measuring tools. Different ontology means that value laden things people might care about like “having good relationships with other humans” are not meaningful things to future AIs in terms of their world model, not something they would care much by default (they aren’t even modeling the world in those terms), and it would be hard to encode a utility function so they care about it despite the ontological difference.