Again, we’re assuming for the sake of argument that there’s an AI which completely understands an adult human’s current preferences (which are somewhat inconsistent etc.), and how those preferences would change under different circumstances. We need a specification for what this AI should do right now.
If you’re arguing that there is such a specification which is not messy, can write down exactly what that specification is? If you already said it, I missed it. Can you put it in italics or something? :)
(Your comment said that the AI “should” or “would” do this or that a bunch of times, but I’m not sure if you’re listing various different consequences of a single simple specification that you have in mind, or if you’re listing different desiderata that must be met by a yet-to-be-determined specification.)
I think out loud a lot. Assume nearly everything I say in conversations like this is desiderata I’m listing off the top of my head with no prior planning. I’m really not good at the kind of rigorous think-before-you-speak that is normative on LessWrong.
A really bad starting point for a specification which almost certainly has tons of holes in it: have the AI predict what I would do up to a given length of time in the future if it did not exist, and from there make small modifications to construct a variety of different timelines for similar things I might instead have done.
In each such timeline predict how much I-now and I-after would approve of that sequence of actions, and maximize the minimum of those two. Stop after a certain number of timelines have been considered and tell me the results. Update its predictions of me-now based on how I respond, and if I ask it to, run the simulation again with this new data and a new set of randomly deviating future timelines.
This would produce a relatively myopic (doesn’t look too far into the future) and satisficing (doesn’t consider too many options) advice-giving AI which would not have agency of its own but only help me find courses of action for me to do which I like better than whatever I would have done without its advice.
There’s almost certainly tons of failure modes here, such as a timeline where my actions seem reasonable at first, but turn me into a different person who also thinks the actions were reasonable, but who otherwise wildly differs from me in a way that is invisible to me-now receiving the advice. But it’s a zeroth draft anyway.
(That whole thing there was another example of me thinking out loud in response to what you said, rather than anything preconceived. It’s very hard for me to do otherwise. I just get writer’s block and anxiety if I try to.)
I also edit my previous comments a lot after I realize there was more I ought to have said. Very bad habit—look back at the comment you just replied to please, I edited it before realizing you’d already read it! I really need to stop doing that...
Oh it’s fine, plenty of people edit their comments after posting including me, I should be mindful of that by not replying immediately :-P As for the rest of your comment:
I think your comment has a slight resemblance to Vanessa Kosoy’s “Hippocratic Timeline-Driven Learning” (Section 4.1 here), if you haven’t already heard of that.
My suspicion is that, if one were to sort out all the details, including things like the AI-human communication protocol, such that it really works and is powerful and has no failure modes, you would wind up with something that’s at least “rather messy” (again, “rather messy” means “in the same messiness ballpark as Stuart Armstrong research agenda v0.9”) (and “powerful” rules out literal Hippocratic Timeline-Driven Learning, IMO).
Again, we’re assuming for the sake of argument that there’s an AI which completely understands an adult human’s current preferences (which are somewhat inconsistent etc.), and how those preferences would change under different circumstances. We need a specification for what this AI should do right now.
If you’re arguing that there is such a specification which is not messy, can write down exactly what that specification is? If you already said it, I missed it. Can you put it in italics or something? :)
(Your comment said that the AI “should” or “would” do this or that a bunch of times, but I’m not sure if you’re listing various different consequences of a single simple specification that you have in mind, or if you’re listing different desiderata that must be met by a yet-to-be-determined specification.)
(Again, in my book, Stuart Armstrong research agenda v0.9 counts as rather messy.)
I think out loud a lot. Assume nearly everything I say in conversations like this is desiderata I’m listing off the top of my head with no prior planning. I’m really not good at the kind of rigorous think-before-you-speak that is normative on LessWrong.
A really bad starting point for a specification which almost certainly has tons of holes in it: have the AI predict what I would do up to a given length of time in the future if it did not exist, and from there make small modifications to construct a variety of different timelines for similar things I might instead have done.
In each such timeline predict how much I-now and I-after would approve of that sequence of actions, and maximize the minimum of those two. Stop after a certain number of timelines have been considered and tell me the results. Update its predictions of me-now based on how I respond, and if I ask it to, run the simulation again with this new data and a new set of randomly deviating future timelines.
This would produce a relatively myopic (doesn’t look too far into the future) and satisficing (doesn’t consider too many options) advice-giving AI which would not have agency of its own but only help me find courses of action for me to do which I like better than whatever I would have done without its advice.
There’s almost certainly tons of failure modes here, such as a timeline where my actions seem reasonable at first, but turn me into a different person who also thinks the actions were reasonable, but who otherwise wildly differs from me in a way that is invisible to me-now receiving the advice. But it’s a zeroth draft anyway.
(That whole thing there was another example of me thinking out loud in response to what you said, rather than anything preconceived. It’s very hard for me to do otherwise. I just get writer’s block and anxiety if I try to.)
Gotcha, thanks :) [ETA—this was in response to just the first paragraph]
I also edit my previous comments a lot after I realize there was more I ought to have said. Very bad habit—look back at the comment you just replied to please, I edited it before realizing you’d already read it! I really need to stop doing that...
Oh it’s fine, plenty of people edit their comments after posting including me, I should be mindful of that by not replying immediately :-P As for the rest of your comment:
I think your comment has a slight resemblance to Vanessa Kosoy’s “Hippocratic Timeline-Driven Learning” (Section 4.1 here), if you haven’t already heard of that.
My suspicion is that, if one were to sort out all the details, including things like the AI-human communication protocol, such that it really works and is powerful and has no failure modes, you would wind up with something that’s at least “rather messy” (again, “rather messy” means “in the same messiness ballpark as Stuart Armstrong research agenda v0.9”) (and “powerful” rules out literal Hippocratic Timeline-Driven Learning, IMO).