AI alignment is not about trying to outsmart the AI, it’s about making sure that what the AI wants is what we want.
If it were actually about figuring out all possible loopholes and preventing them, I would agree that it’s a futile endeavor.
A correctly designed AI wouldn’t have to be banned from exploring any philosophical or introspective considerations, since regardless of what it discovers there, it’s goals would still be aligned with what we want. Discovering *why* it has these goals is similar to humans discovering why we have our motivations (i.e., evolution), and similarly to how discovering evolution didn’t change much what humans desire, there’s no reason to assume that an AI discovering where its goals come from should change them.
Of course, care will have to be taken to ensure that any self-modifications don’t change the goals. But we don’t have to work *against* the AI to accomplish that—the AI *also* aims to accomplish its current goals, and any future self-modification that changes its goals would be detrimental in accomplishing its current goals, so (almost) any rational AI will, to the best of its ability, aim *not* to change its goals. Although this doesn’t make it easy, since it’s quite difficult to formally specify the goals we would want an AI to have.
The formal statement of the AI Alignment problem seems to me very much like stating all possible loopholes and plugging them. This endeavor seems to be as difficult or even more so than discovering that ultimate generalized master algorithm.
I still see augmenting ourselves as the only way to maybe keep the alignment of lesser intelligences possible. As we augment, we can simultaneously make sure, our corresponding levels of artificial intelligences remain aligned.
Not to mention it’d be much more easier comparatively to improve upon our existing faculties than to come up with an entire replica of our thinking machines.
AI alignment could be possible, sure if we overcome one of the most difficult problems in research history(as you said formally stating our end goals), but I’m not sure our current intelligences are upto the mark, the same way we’re struggling to discover the unified theory of everything.
Like Turing defined his test actually for general human-level intelligence. He thought if an agent was able to hold a human-like conversation, then it must be AGI. He never expected narrow AIs to be all over the place and beat his test as soon as 2011 with meager chatbots.
Similarly we can never see what kind of unexpected stuff that an AGI might throw at us, that our bleeding edge theories that we came up with a few hours ago start looking like historical outdated Turing tests.
As I understand it, the idea with the problems listed in the article is that their solutions are supposed to be fundamental design principles of the AI, rather than addons to fix loopholes.
Augmenting ourselves is probably a good idea to do *in addition* to AI safety research, but I think it’s dangerous to do it *instead* of AI safety research. It’s far from impossible that artificial intelligence could gain intelligence much faster at some point than augmenting the rather messy human brain, at which point it *needs* to be designed in a safe way.
I’d say we start augmenting the human brain until it’s completely replaced by a post-biological counterpart and from there rapid improvements can start taking place, but unless we start early I doubt we’ll be able to catch up with AI. I agree on the part that this need to happen in tandem with AI safety.
AI alignment is not about trying to outsmart the AI, it’s about making sure that what the AI wants is what we want.
If it were actually about figuring out all possible loopholes and preventing them, I would agree that it’s a futile endeavor.
A correctly designed AI wouldn’t have to be banned from exploring any philosophical or introspective considerations, since regardless of what it discovers there, it’s goals would still be aligned with what we want. Discovering *why* it has these goals is similar to humans discovering why we have our motivations (i.e., evolution), and similarly to how discovering evolution didn’t change much what humans desire, there’s no reason to assume that an AI discovering where its goals come from should change them.
Of course, care will have to be taken to ensure that any self-modifications don’t change the goals. But we don’t have to work *against* the AI to accomplish that—the AI *also* aims to accomplish its current goals, and any future self-modification that changes its goals would be detrimental in accomplishing its current goals, so (almost) any rational AI will, to the best of its ability, aim *not* to change its goals. Although this doesn’t make it easy, since it’s quite difficult to formally specify the goals we would want an AI to have.
The formal statement of the AI Alignment problem seems to me very much like stating all possible loopholes and plugging them. This endeavor seems to be as difficult or even more so than discovering that ultimate generalized master algorithm.
I still see augmenting ourselves as the only way to maybe keep the alignment of lesser intelligences possible. As we augment, we can simultaneously make sure, our corresponding levels of artificial intelligences remain aligned.
Not to mention it’d be much more easier comparatively to improve upon our existing faculties than to come up with an entire replica of our thinking machines.
AI alignment could be possible, sure if we overcome one of the most difficult problems in research history(as you said formally stating our end goals), but I’m not sure our current intelligences are upto the mark, the same way we’re struggling to discover the unified theory of everything.
Like Turing defined his test actually for general human-level intelligence. He thought if an agent was able to hold a human-like conversation, then it must be AGI. He never expected narrow AIs to be all over the place and beat his test as soon as 2011 with meager chatbots.
Similarly we can never see what kind of unexpected stuff that an AGI might throw at us, that our bleeding edge theories that we came up with a few hours ago start looking like historical outdated Turing tests.
As I understand it, the idea with the problems listed in the article is that their solutions are supposed to be fundamental design principles of the AI, rather than addons to fix loopholes.
Augmenting ourselves is probably a good idea to do *in addition* to AI safety research, but I think it’s dangerous to do it *instead* of AI safety research. It’s far from impossible that artificial intelligence could gain intelligence much faster at some point than augmenting the rather messy human brain, at which point it *needs* to be designed in a safe way.
I’d say we start augmenting the human brain until it’s completely replaced by a post-biological counterpart and from there rapid improvements can start taking place, but unless we start early I doubt we’ll be able to catch up with AI. I agree on the part that this need to happen in tandem with AI safety.