The problem also exists with regard to an alignment assistant, although the problem is exacerbated here because “retargetable” is part of the specification. On the other hand, unlike the AI Assistant paradigm, a neocortex prothesis need not be optimized to be user-friendly, and will probably have a respectable learning curve, which makes instant/universal adoption by others less likely. There are also other steps that could be taken to mitigate risks (e.g. siloeing information).
Second-order impacts are important to consider, but I also think it’s productive to think separately about the problem of what systems would be the most useful to alignment researchers.
More importantly though, there’s a point that I think matters here that you said. GPT is not an agent, and a lot of AI risk arguments don’t work without agents.
One other point to keep in mind is that for the most part, capabilities people will probably create better AIs no matter what we do, so there isn’t much control here.
I think that we don’t have much choice in this matter. Automated research is the only way we can even reasonably solve the alignment problem on short timelines.
I think the concern expressed here is that the neocortex prosthesis could be used by capabilities researchers to do capabilities research more effectively, rather than the system being directly a dangerous agent.
This is not the post where I intended to discuss this question, just want to express disagreement here: you want a useful LLM, not LLM that produces all possible completions of your idea, but LLM that produces useful completion of your idea. So you want LLM which outputs are at least partially weighted by their usefulness (like ChatGPT), which implies consequentialism.
Yup, that’s a problem.
The problem also exists with regard to an alignment assistant, although the problem is exacerbated here because “retargetable” is part of the specification. On the other hand, unlike the AI Assistant paradigm, a neocortex prothesis need not be optimized to be user-friendly, and will probably have a respectable learning curve, which makes instant/universal adoption by others less likely. There are also other steps that could be taken to mitigate risks (e.g. siloeing information).
Second-order impacts are important to consider, but I also think it’s productive to think separately about the problem of what systems would be the most useful to alignment researchers.
More importantly though, there’s a point that I think matters here that you said. GPT is not an agent, and a lot of AI risk arguments don’t work without agents.
One other point to keep in mind is that for the most part, capabilities people will probably create better AIs no matter what we do, so there isn’t much control here.
I think that we don’t have much choice in this matter. Automated research is the only way we can even reasonably solve the alignment problem on short timelines.
I think the concern expressed here is that the neocortex prosthesis could be used by capabilities researchers to do capabilities research more effectively, rather than the system being directly a dangerous agent.
This is not the post where I intended to discuss this question, just want to express disagreement here: you want a useful LLM, not LLM that produces all possible completions of your idea, but LLM that produces useful completion of your idea. So you want LLM which outputs are at least partially weighted by their usefulness (like ChatGPT), which implies consequentialism.