for every possible utility function, there could exist some creature that would try and pursue it (weak form),
at least one of these creatures, for every possible utility function, doesn’t have to be strange; it doesn’t have to have a weird/inefficient design in order to pursue a certain goal (strong form).
And given that these are true, then an AGI that values mountains is as likely as an AGI that values intelligent life.
But, is the strong form likely? An AGI that pursues its own values (or trying to discover good values to follow) seems to be much simpler than something arbitrary (e.g. “build sand castles”) or even something ethical (e.g. “be nice towards all sentient life”). That is, simpler in that you don’t need any controls to make sure the AGI doesn’t try to rewrite its software.
The reference was mostly a reply to “a paperclipper can’t really be intelligent”. It can be intelligent in the sense relevant for AI risk.
I guess the current contenders for AGI are unlikely to become paperclippers, perhaps not even RL reduces to squiggle maximization. I think simple goals still give an important class of AIs, because such goals might be easier to preserve through recursive self-improvement, making AIs that pursue them afford faster FOOMing. AIs with complicated values might instead need to hold off on self-improvement much longer to ensure alignment, which makes them vulnerable to being overtaken by the FOOMing paperclippers. This motivates strong coordination that would prevent initial construction of paperclippers anywhere in the world.
This thesis says two things:
for every possible utility function, there could exist some creature that would try and pursue it (weak form),
at least one of these creatures, for every possible utility function, doesn’t have to be strange; it doesn’t have to have a weird/inefficient design in order to pursue a certain goal (strong form).
And given that these are true, then an AGI that values mountains is as likely as an AGI that values intelligent life.
But, is the strong form likely? An AGI that pursues its own values (or trying to discover good values to follow) seems to be much simpler than something arbitrary (e.g. “build sand castles”) or even something ethical (e.g. “be nice towards all sentient life”). That is, simpler in that you don’t need any controls to make sure the AGI doesn’t try to rewrite its software.
The reference was mostly a reply to “a paperclipper can’t really be intelligent”. It can be intelligent in the sense relevant for AI risk.
I guess the current contenders for AGI are unlikely to become paperclippers, perhaps not even RL reduces to squiggle maximization. I think simple goals still give an important class of AIs, because such goals might be easier to preserve through recursive self-improvement, making AIs that pursue them afford faster FOOMing. AIs with complicated values might instead need to hold off on self-improvement much longer to ensure alignment, which makes them vulnerable to being overtaken by the FOOMing paperclippers. This motivates strong coordination that would prevent initial construction of paperclippers anywhere in the world.