Having an AGI that can plan billions of years in the future is valuable to nobody today compared to one with a much, much, shorter planning horizon. Constraining this ‘capability’ has an essentially negligible alignment tax.
Well, sure, if someone can figure out how to limit an AGI to plans that span only 3 months, that would be welcome, but what makes you think that anyone is going to figure that out before the end?
What progress has been made towards that welcome end in the 19 years or so since researchers started publishing on AI alignment?
Your comment makes me realize that I don’t have enough knowledge to know how practical it is to impose such a limit on the length of plans.
But even if it is practical, the resulting AGI will still see the humans as a threat to its 3-month-long plan (e.g., the humans might create a second superintelligent AI during the 3 months) which is dangerous to the humans.
But maybe killing every last one of us is too much trouble with the result that the AGI ‘merely’ destroys our ability to generate electricity combined with creating enough chaos to prevent us from even starting to repair our electrical grid during the 3 months.
Also, the 3-month limit on plans would prevent the AGI from ending other species in other solar systems (because it takes more than 3 months to travel to other solar systems) after it ends the human species, which is an attractive feature relative to other AGI designs.
Well, sure, if someone can figure out how to limit an AGI to plans that span only 3 months, that would be welcome, but what makes you think that anyone is going to figure that out before the end?
What progress has been made towards that welcome end in the 19 years or so since researchers started publishing on AI alignment?
This can be incentivised through an appropriate discount rate in the reward function?
Your comment makes me realize that I don’t have enough knowledge to know how practical it is to impose such a limit on the length of plans.
But even if it is practical, the resulting AGI will still see the humans as a threat to its 3-month-long plan (e.g., the humans might create a second superintelligent AI during the 3 months) which is dangerous to the humans.
But maybe killing every last one of us is too much trouble with the result that the AGI ‘merely’ destroys our ability to generate electricity combined with creating enough chaos to prevent us from even starting to repair our electrical grid during the 3 months.
Also, the 3-month limit on plans would prevent the AGI from ending other species in other solar systems (because it takes more than 3 months to travel to other solar systems) after it ends the human species, which is an attractive feature relative to other AGI designs.
Numerous technical measures for myopia. CAIS being one. Another being a high discount rate in the reward heuristic.