This makes assumptions that make no sense to me. Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled. LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret. We have no reliable way to predict when LLMs might hallucinate or misbehave in some other way. There is also no “human level”—LLMs are way faster than humans and are way more scalable than humans—there is no way to get LLMs that are as good as humans without having something that’s way better than humans along a huge number of dimensions.
I love it when comments include arguments I have already raised in my “Some obvious objections to this argument” section.
Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled.
I agree with you that AutoGPT is not passively safe/myopic. However as I pointed out AI agents “only optionally mitigate myopia and passive safety.”. If myopia and passive safety are critical safety guarantees it’s easy to include them in AI Agents.
LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret.
This simply isn’t true. I would encourage you to keep up to date with the latest research on AI interoperability. LLMs are highly interpretable. Not only can we understand their world models we can also detect whether or not they believe a statement to be true or whether or not they are lying.
More importantly, LLMs are much easier to interpret than biological systems (the product of evolution). The argument here is that we should scale up (relatively) easy-to-interpret LLMs now before the arrival of evolution-based AIs.
There is also no “human level”—LLMs are way faster than humans and are way more scalable than humans—there is no way to get LLMs that are as good as humans without having something that’s way better than humans along a huge number of dimensions.
I’m not sure what point you’re trying to make here. The question of importance isn’t whether Deep-Learning models will ever be exactly human-level. The question is whether we can use them to safely augment human intelligence in order to solve the Alignment Problem.
I agree that LLMs are super-human on some dimensions (fact recall) and inferior to humans on others (ability to play connect 4) and therefore if an LLM (or AI-agent) was at-least human-level on all dimensions, it would naturally be super-human on at least some of them. This fact alone doesn’t tell us whether or not LLMs are safe to use.
I think that we have very strong reasons to believe that a GPT-N style architecture would be highly safe and more-importantly that it would be far safter and more interpretable than an equally-powerful AI modeled after the human brain, or chosen randomly by evolution.
This makes assumptions that make no sense to me. Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled. LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret. We have no reliable way to predict when LLMs might hallucinate or misbehave in some other way. There is also no “human level”—LLMs are way faster than humans and are way more scalable than humans—there is no way to get LLMs that are as good as humans without having something that’s way better than humans along a huge number of dimensions.
I love it when comments include arguments I have already raised in my “Some obvious objections to this argument” section.
I agree with you that AutoGPT is not passively safe/myopic. However as I pointed out AI agents “only optionally mitigate myopia and passive safety.”. If myopia and passive safety are critical safety guarantees it’s easy to include them in AI Agents.
This simply isn’t true. I would encourage you to keep up to date with the latest research on AI interoperability. LLMs are highly interpretable. Not only can we understand their world models we can also detect whether or not they believe a statement to be true or whether or not they are lying.
More importantly, LLMs are much easier to interpret than biological systems (the product of evolution). The argument here is that we should scale up (relatively) easy-to-interpret LLMs now before the arrival of evolution-based AIs.
I’m not sure what point you’re trying to make here. The question of importance isn’t whether Deep-Learning models will ever be exactly human-level. The question is whether we can use them to safely augment human intelligence in order to solve the Alignment Problem.
I agree that LLMs are super-human on some dimensions (fact recall) and inferior to humans on others (ability to play connect 4) and therefore if an LLM (or AI-agent) was at-least human-level on all dimensions, it would naturally be super-human on at least some of them. This fact alone doesn’t tell us whether or not LLMs are safe to use.
I think that we have very strong reasons to believe that a GPT-N style architecture would be highly safe and more-importantly that it would be far safter and more interpretable than an equally-powerful AI modeled after the human brain, or chosen randomly by evolution.