Steven Byrnes comments on Why Agent Foundations? An Overly Abstract Explanation

Steven Byrnes 22 Jun 2022 15:55 UTC
3 points
Again, we’re assuming for the sake of argument that there’s an AI which completely understands an adult human’s current preferences (which are somewhat inconsistent etc.), and how those preferences would change under different circumstances. We need a specification for what this AI should do right now.
If you’re arguing that there is such a specification which is not messy, can write down exactly what that specification is? If you already said it, I missed it. Can you put it in italics or something? :)
(Your comment said that the AI “should” or “would” do this or that a bunch of times, but I’m not sure if you’re listing various different consequences of a single simple specification that you have in mind, or if you’re listing different desiderata that must be met by a yet-to-be-determined specification.)
(Again, in my book, Stuart Armstrong research agenda v0.9 counts as rather messy.)
- MSRayne 22 Jun 2022 16:20 UTC
  3 points
  Parent
  I think out loud a lot. Assume nearly everything I say in conversations like this is desiderata I’m listing off the top of my head with no prior planning. I’m really not good at the kind of rigorous think-before-you-speak that is normative on LessWrong.
  A really bad starting point for a specification which almost certainly has tons of holes in it: have the AI predict what I would do up to a given length of time in the future if it did not exist, and from there make small modifications to construct a variety of different timelines for similar things I might instead have done.
  In each such timeline predict how much I-now and I-after would approve of that sequence of actions, and maximize the minimum of those two. Stop after a certain number of timelines have been considered and tell me the results. Update its predictions of me-now based on how I respond, and if I ask it to, run the simulation again with this new data and a new set of randomly deviating future timelines.
  This would produce a relatively myopic (doesn’t look too far into the future) and satisficing (doesn’t consider too many options) advice-giving AI which would not have agency of its own but only help me find courses of action for me to do which I like better than whatever I would have done without its advice.
  There’s almost certainly tons of failure modes here, such as a timeline where my actions seem reasonable at first, but turn me into a different person who also thinks the actions were reasonable, but who otherwise wildly differs from me in a way that is invisible to me-now receiving the advice. But it’s a zeroth draft anyway.
  (That whole thing there was another example of me thinking out loud in response to what you said, rather than anything preconceived. It’s very hard for me to do otherwise. I just get writer’s block and anxiety if I try to.)
  - Steven Byrnes 22 Jun 2022 16:22 UTC
    2 points
    Parent
    Gotcha, thanks :) [ETA—this was in response to just the first paragraph]
    - MSRayne 22 Jun 2022 16:30 UTC
      1 point
      Parent
      I also edit my previous comments a lot after I realize there was more I ought to have said. Very bad habit—look back at the comment you just replied to please, I edited it before realizing you’d already read it! I really need to stop doing that...
      - Steven Byrnes 5 Jul 2022 20:46 UTC
        4 points
        Parent
        Oh it’s fine, plenty of people edit their comments after posting including me, I should be mindful of that by not replying immediately :-P As for the rest of your comment:
        I think your comment has a slight resemblance to Vanessa Kosoy’s “Hippocratic Timeline-Driven Learning” (Section 4.1 here), if you haven’t already heard of that.
        My suspicion is that, if one were to sort out all the details, including things like the AI-human communication protocol, such that it really works and is powerful and has no failure modes, you would wind up with something that’s at least “rather messy” (again, “rather messy” means “in the same messiness ballpark as Stuart Armstrong research agenda v0.9”) (and “powerful” rules out literal Hippocratic Timeline-Driven Learning, IMO).