I mostly agree with this. However there are a few quibbles.
If humans are reading human written source code, something that will be super-intelligent when run, the humans won’t be hacked. At least not directly.
I suspect there is a range of different “propensities to see logical correlation” that are possible.
Agent X sees itself logically correlated with anything remotely trying to do LDT ish reasoning. Agent Y considers itself to only be logically correlated with near bitwise perfect simulations of its own code. And I suspect both of these are reasonably natural agent designs. It is a free parameter, something with many choices, all consistent under self-reflection. Like priors, or utility function.
I am not confident in this.
So I think it’s quite plausible we could create some things the AI perceives as logical correlation between our decision to release the AI and the AI’s future decisions. (Because the AI sees logical correlation everywhere, maybe the evolution of plants is similar enough to part of it’s solar cell designing algorithm that a small correlation exists there too.) This would give us some effect on the AI’s actions. Not an effect that we can use to make the AI do nice things, basically shuffling an already random deck of cards, but an effect nonetheless.
I mostly agree with this. However there are a few quibbles.
If humans are reading human written source code, something that will be super-intelligent when run, the humans won’t be hacked. At least not directly.
I suspect there is a range of different “propensities to see logical correlation” that are possible.
Agent X sees itself logically correlated with anything remotely trying to do LDT ish reasoning. Agent Y considers itself to only be logically correlated with near bitwise perfect simulations of its own code. And I suspect both of these are reasonably natural agent designs. It is a free parameter, something with many choices, all consistent under self-reflection. Like priors, or utility function.
I am not confident in this.
So I think it’s quite plausible we could create some things the AI perceives as logical correlation between our decision to release the AI and the AI’s future decisions. (Because the AI sees logical correlation everywhere, maybe the evolution of plants is similar enough to part of it’s solar cell designing algorithm that a small correlation exists there too.) This would give us some effect on the AI’s actions. Not an effect that we can use to make the AI do nice things, basically shuffling an already random deck of cards, but an effect nonetheless.