Why do we assume that any AGI can meaningfully be described as a utility maximizer?
You’re right that not every conceivable general intelligence is built as a utility maximizer. Humans are an example of this.
One problem is, even if you make a “weak” form of general intelligence that isn’t trying particularly hard to optimize anything, or a tool AI, eventually someone at FAIR will make an agentic version that does in fact directly try to optimize Facebook’s stock market valuation.
Do not use FAIR as a symbol of villainy. They’re a group of real, smart, well-meaning people who we need to be capable of reaching, and who still have some lines of respect connecting them to the alignment community. Don’t break them.
Can we control the blind spots of the agent? For example, I could imaging that we could make a very strong agent, that is able to explain acausal trade but unable to (deliberately) participate in any acausal trades, because of the way it understands counterfacuals.
Could it be possible to create AI with similar minor weaknesses?
Probably not, because it’s hard to get a general intelligence to make consistently wrong decisions in any capacity. Partly because, like you or me, it might realize that it has a design flaw and work around it.
A better plan is just to explicitly bake corrigibility guarantees (i.e. the stop button) into the design. Figuring out how to do that that is the hard part, though.
You’re right that not every conceivable general intelligence is built as a utility maximizer. Humans are an example of this.
One problem is, even if you make a “weak” form of general intelligence that isn’t trying particularly hard to optimize anything, or a tool AI, eventually someone at FAIR will make an agentic version that does in fact directly try to optimize Facebook’s stock market valuation.
Do not use FAIR as a symbol of villainy. They’re a group of real, smart, well-meaning people who we need to be capable of reaching, and who still have some lines of respect connecting them to the alignment community. Don’t break them.
Can we control the blind spots of the agent? For example, I could imaging that we could make a very strong agent, that is able to explain acausal trade but unable to (deliberately) participate in any acausal trades, because of the way it understands counterfacuals. Could it be possible to create AI with similar minor weaknesses?
Probably not, because it’s hard to get a general intelligence to make consistently wrong decisions in any capacity. Partly because, like you or me, it might realize that it has a design flaw and work around it.
A better plan is just to explicitly bake corrigibility guarantees (i.e. the stop button) into the design. Figuring out how to do that that is the hard part, though.