Glad to hear new thinkers grappling with the problem. I agree with what some of the other commenters have said about the thoughts here being unfinished, but I also think that that is a reasonable place to start.
One approach forward could be asking yourself about how this could be given more robustness in the case of highly general and very smart systems. I don’t yet see a path forward towards that for this plan, but you might.
Another approach forward would be to aim for developing this tool for an easier use case. Could a narrowly superhuman plan-suggester which made detailed risk estimates of a wide variety of possible options be useful to humanity in the regime where we were still able to safely oversee it? I think so. For instance, perhaps we could ask it to help us design a system of rewards (e.g competitions) and punishments (e.g. legislation enacting fines) that would help us reshape the AI development landscape to be less of a Molochian race-to-the-bottom and more of a virtuous Win-Win landscape. For more background on this idea see: [Future of Life Institute Podcast] Liv Boeree on Moloch, Beauty Filters, Game Theory, Institutions, and AI #futureOfLifeInstitutePodcast
https://podcastaddict.com/episode/154738782
Glad to hear new thinkers grappling with the problem. I agree with what some of the other commenters have said about the thoughts here being unfinished, but I also think that that is a reasonable place to start. One approach forward could be asking yourself about how this could be given more robustness in the case of highly general and very smart systems. I don’t yet see a path forward towards that for this plan, but you might. Another approach forward would be to aim for developing this tool for an easier use case. Could a narrowly superhuman plan-suggester which made detailed risk estimates of a wide variety of possible options be useful to humanity in the regime where we were still able to safely oversee it? I think so. For instance, perhaps we could ask it to help us design a system of rewards (e.g competitions) and punishments (e.g. legislation enacting fines) that would help us reshape the AI development landscape to be less of a Molochian race-to-the-bottom and more of a virtuous Win-Win landscape. For more background on this idea see: [Future of Life Institute Podcast] Liv Boeree on Moloch, Beauty Filters, Game Theory, Institutions, and AI #futureOfLifeInstitutePodcast https://podcastaddict.com/episode/154738782