I expect to not see this, conditional on adding a stipulation like “the AI wasn’t scaffolded and then given a goal like ‘maximize profit’”, because I could imagine the AI-system coming up with nasty subgoals. In particular, I don’t expect egregiously bad actions from autoregressive sampling of an LLM tasked with doing scientific research.
I expect to not see this, conditional on adding a stipulation like “the AI wasn’t scaffolded and then given a goal like ‘maximize profit’”, because I could imagine the AI-system coming up with nasty subgoals. In particular, I don’t expect egregiously bad actions from autoregressive sampling of an LLM tasked with doing scientific research.