I think out loud a lot. Assume nearly everything I say in conversations like this is desiderata I’m listing off the top of my head with no prior planning. I’m really not good at the kind of rigorous think-before-you-speak that is normative on LessWrong.
A really bad starting point for a specification which almost certainly has tons of holes in it: have the AI predict what I would do up to a given length of time in the future if it did not exist, and from there make small modifications to construct a variety of different timelines for similar things I might instead have done.
In each such timeline predict how much I-now and I-after would approve of that sequence of actions, and maximize the minimum of those two. Stop after a certain number of timelines have been considered and tell me the results. Update its predictions of me-now based on how I respond, and if I ask it to, run the simulation again with this new data and a new set of randomly deviating future timelines.
This would produce a relatively myopic (doesn’t look too far into the future) and satisficing (doesn’t consider too many options) advice-giving AI which would not have agency of its own but only help me find courses of action for me to do which I like better than whatever I would have done without its advice.
There’s almost certainly tons of failure modes here, such as a timeline where my actions seem reasonable at first, but turn me into a different person who also thinks the actions were reasonable, but who otherwise wildly differs from me in a way that is invisible to me-now receiving the advice. But it’s a zeroth draft anyway.
(That whole thing there was another example of me thinking out loud in response to what you said, rather than anything preconceived. It’s very hard for me to do otherwise. I just get writer’s block and anxiety if I try to.)
I also edit my previous comments a lot after I realize there was more I ought to have said. Very bad habit—look back at the comment you just replied to please, I edited it before realizing you’d already read it! I really need to stop doing that...
Oh it’s fine, plenty of people edit their comments after posting including me, I should be mindful of that by not replying immediately :-P As for the rest of your comment:
I think your comment has a slight resemblance to Vanessa Kosoy’s “Hippocratic Timeline-Driven Learning” (Section 4.1 here), if you haven’t already heard of that.
My suspicion is that, if one were to sort out all the details, including things like the AI-human communication protocol, such that it really works and is powerful and has no failure modes, you would wind up with something that’s at least “rather messy” (again, “rather messy” means “in the same messiness ballpark as Stuart Armstrong research agenda v0.9”) (and “powerful” rules out literal Hippocratic Timeline-Driven Learning, IMO).
I think out loud a lot. Assume nearly everything I say in conversations like this is desiderata I’m listing off the top of my head with no prior planning. I’m really not good at the kind of rigorous think-before-you-speak that is normative on LessWrong.
A really bad starting point for a specification which almost certainly has tons of holes in it: have the AI predict what I would do up to a given length of time in the future if it did not exist, and from there make small modifications to construct a variety of different timelines for similar things I might instead have done.
In each such timeline predict how much I-now and I-after would approve of that sequence of actions, and maximize the minimum of those two. Stop after a certain number of timelines have been considered and tell me the results. Update its predictions of me-now based on how I respond, and if I ask it to, run the simulation again with this new data and a new set of randomly deviating future timelines.
This would produce a relatively myopic (doesn’t look too far into the future) and satisficing (doesn’t consider too many options) advice-giving AI which would not have agency of its own but only help me find courses of action for me to do which I like better than whatever I would have done without its advice.
There’s almost certainly tons of failure modes here, such as a timeline where my actions seem reasonable at first, but turn me into a different person who also thinks the actions were reasonable, but who otherwise wildly differs from me in a way that is invisible to me-now receiving the advice. But it’s a zeroth draft anyway.
(That whole thing there was another example of me thinking out loud in response to what you said, rather than anything preconceived. It’s very hard for me to do otherwise. I just get writer’s block and anxiety if I try to.)
Gotcha, thanks :) [ETA—this was in response to just the first paragraph]
I also edit my previous comments a lot after I realize there was more I ought to have said. Very bad habit—look back at the comment you just replied to please, I edited it before realizing you’d already read it! I really need to stop doing that...
Oh it’s fine, plenty of people edit their comments after posting including me, I should be mindful of that by not replying immediately :-P As for the rest of your comment:
I think your comment has a slight resemblance to Vanessa Kosoy’s “Hippocratic Timeline-Driven Learning” (Section 4.1 here), if you haven’t already heard of that.
My suspicion is that, if one were to sort out all the details, including things like the AI-human communication protocol, such that it really works and is powerful and has no failure modes, you would wind up with something that’s at least “rather messy” (again, “rather messy” means “in the same messiness ballpark as Stuart Armstrong research agenda v0.9”) (and “powerful” rules out literal Hippocratic Timeline-Driven Learning, IMO).