These question-answering AIs are often called Oracles, you can find some info on them here. Their cousin tool AI is also relevant here. You’ll discover that they are probably safer but by no means entirely safe.
We are working on an answer for the safety of Oracles for Stampy, keep your eyes peeled it should show up soon.
An AGI that can answer questions accurately, such as “What would this agentic AGI do in this situation” will, if powerful enough, learn what agency is by default since this is useful to predict such things. So you can’t just train an AGI with little agency. You would need to do one of:
Train the AGI with the capabilities of agency, and train it not to use them for anything other than answering questions.
Train the AGI such that it did not develop agency despite being pushed by gradient descent to do so, and accept the loss in performance.
Both of these seem like difficult problems—if we could solve either (especially the first) this would be a very useful thing, but the first especially seems like a big part of the problem already.
How feasible is it to build a powerful AGI with little agency but that’s good at answering questions accurately?
Could we solve alignment with one, e.g. getting it to do research, or using it to verify the outputs of other AGIs?
These question-answering AIs are often called Oracles, you can find some info on them here. Their cousin tool AI is also relevant here. You’ll discover that they are probably safer but by no means entirely safe.
We are working on an answer for the safety of Oracles for Stampy, keep your eyes peeled it should show up soon.
An AGI that can answer questions accurately, such as “What would this agentic AGI do in this situation” will, if powerful enough, learn what agency is by default since this is useful to predict such things. So you can’t just train an AGI with little agency. You would need to do one of:
Train the AGI with the capabilities of agency, and train it not to use them for anything other than answering questions.
Train the AGI such that it did not develop agency despite being pushed by gradient descent to do so, and accept the loss in performance.
Both of these seem like difficult problems—if we could solve either (especially the first) this would be a very useful thing, but the first especially seems like a big part of the problem already.