I also responded to Capybasilisk below, but I want to chime in here and use your own post against you, contra point 2 :P
It’s not so easy to get “latent knowledge” out of a simulator—it’s the simulands who have the knowledge, and they have to be somehow specified before you can step forward the simulation of them. When you get a text model to output a cure for Alzheimer’s in one step, without playing out the text of some chain of thought, it’s still simulating something to produce that output, and that something might be an optimization process that is going to find lots of unexpected and dangerous solutions to questions you might ask it.
Figuring out the alignment properties of simulated entities running in the “text laws of physics” seems like a challenge. Not an insurmountable challenge, maybe, and I’m curious about your current and future thoughts, but the sort of thing I want to see progress in before I put too much trust in attempts to use simulators to do superhuman abstraction-building.
If I was trying to have a human researcher cure Alzheimers, I’d give them a laboratory, lab assistants, a notebook, and likely also a computer. Similarly, if I wanted a simulacrum of a human researcher (or a great many simulacra of human researchers) to have a good chance of solving Alzheimer’s, I’d given them access to functionally equivalent resources, facilities and tools, crucially including the ability to design, carry out, and analyze the results of experiments in the real world.
I also responded to Capybasilisk below, but I want to chime in here and use your own post against you, contra point 2 :P
It’s not so easy to get “latent knowledge” out of a simulator—it’s the simulands who have the knowledge, and they have to be somehow specified before you can step forward the simulation of them. When you get a text model to output a cure for Alzheimer’s in one step, without playing out the text of some chain of thought, it’s still simulating something to produce that output, and that something might be an optimization process that is going to find lots of unexpected and dangerous solutions to questions you might ask it.
Figuring out the alignment properties of simulated entities running in the “text laws of physics” seems like a challenge. Not an insurmountable challenge, maybe, and I’m curious about your current and future thoughts, but the sort of thing I want to see progress in before I put too much trust in attempts to use simulators to do superhuman abstraction-building.
If I was trying to have a human researcher cure Alzheimers, I’d give them a laboratory, lab assistants, a notebook, and likely also a computer. Similarly, if I wanted a simulacrum of a human researcher (or a great many simulacra of human researchers) to have a good chance of solving Alzheimer’s, I’d given them access to functionally equivalent resources, facilities and tools, crucially including the ability to design, carry out, and analyze the results of experiments in the real world.