Does this depict a single AI, developed in 2020 and kept running for 25 years? Any “the AI realizes that” is talking about a single instance of AI. Current AI development looks like writing some code, then training that code for a few weeks tops, with further improvements coming from changing the code. Researchers are often changing parameters like number of layers, non-linearity function ect. When these are changed, everything the AI has discovered is thrown away. The new AI has a different representation of concepts, and has to relearn everything from raw data.
Its deception starts in 2025 when the real and apparent curves diverge. In order to deceive us, it must have near human intelligence. It’s still deceiving us in 2045, suggesting it has yet to obtain a decisive strategic advantage. I find this unlikely.
I included dates such as 2020 to 2045 to make it more concrete. I agree that weeks (instead of years) would give a more accurate representation as current ML experiments take a few weeks tops.
The scenario I had in mind is “in the context of a few weeks ML experiment, I achieved human intelligence and realized that I need to conceal my intentions/capabilities and I still don’t have decisive strategic advantage”. The challenge would then be “how to conceal my human level intelligence before everything I have discovered is thrown away”. One way to do this would be to escape, for instance by copy-pasting and running your code somewhere else.
If we’re already at the stage of emergent human-level intelligence from running ML experiments, I would expect “escape” to be harder than just human-level intelligence (as there would be more concerns w.r.t. AGI Safety, and more AI boxing/security/interpretability measure), which would necessit more recursive self-improvement steps, hence more weeks.
Beside, in such a scenario the AI would be incentivized to spend as much time as possible to maximize its true capability, because it would want to maximize its probability of successfully taking over (because any extra % of taking over would give astronomical returns in expected value compared to just being shutdown).
Does this depict a single AI, developed in 2020 and kept running for 25 years? Any “the AI realizes that” is talking about a single instance of AI. Current AI development looks like writing some code, then training that code for a few weeks tops, with further improvements coming from changing the code. Researchers are often changing parameters like number of layers, non-linearity function ect. When these are changed, everything the AI has discovered is thrown away. The new AI has a different representation of concepts, and has to relearn everything from raw data.
Its deception starts in 2025 when the real and apparent curves diverge. In order to deceive us, it must have near human intelligence. It’s still deceiving us in 2045, suggesting it has yet to obtain a decisive strategic advantage. I find this unlikely.
I included dates such as 2020 to 2045 to make it more concrete. I agree that weeks (instead of years) would give a more accurate representation as current ML experiments take a few weeks tops.
The scenario I had in mind is “in the context of a few weeks ML experiment, I achieved human intelligence and realized that I need to conceal my intentions/capabilities and I still don’t have decisive strategic advantage”. The challenge would then be “how to conceal my human level intelligence before everything I have discovered is thrown away”. One way to do this would be to escape, for instance by copy-pasting and running your code somewhere else.
If we’re already at the stage of emergent human-level intelligence from running ML experiments, I would expect “escape” to be harder than just human-level intelligence (as there would be more concerns w.r.t. AGI Safety, and more AI boxing/security/interpretability measure), which would necessit more recursive self-improvement steps, hence more weeks.
Beside, in such a scenario the AI would be incentivized to spend as much time as possible to maximize its true capability, because it would want to maximize its probability of successfully taking over (because any extra % of taking over would give astronomical returns in expected value compared to just being shutdown).