You can totally have something which is trying to kill humanity in this framework though. Imagine something in the style of chaos-GPT, locally agentic & competent enough to use state-of-the-art AI biotech tools to synthesize dangerous viruses or compounds to release into the atmosphere. (note that In this example the critical part is the narrow-AI biotech tools, not the chaos-agent)
You don’t need solutions to embedded agency, goal-content integrity & the like to build this. It is easier to build and is earlier in the tech-tree than crisp maximizers. It will not be stable enough to coherently take over the lightcone. Just coherent enough to fold some proteins and print them.
You can totally have something which is trying to kill humanity in this framework though. Imagine something in the style of chaos-GPT, locally agentic & competent enough to use state-of-the-art AI biotech tools to synthesize dangerous viruses or compounds to release into the atmosphere. (note that In this example the critical part is the narrow-AI biotech tools, not the chaos-agent)
You don’t need solutions to embedded agency, goal-content integrity & the like to build this. It is easier to build and is earlier in the tech-tree than crisp maximizers. It will not be stable enough to coherently take over the lightcone. Just coherent enough to fold some proteins and print them.
But why would anyone do such a stupid thing?