Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):
”This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y).”
Cf. smart humans in Newcomb’s prob: “This is really weird but if I one box I get the million, if I two-box I don’t, so I guess I’ll just one box.”
Yeah, I think this assumes an imperfect implementation. This relation can definitely be learned by the causal model (and is probably learned before the first real decision), but when the decision happen, it is cut. So it’s like a true LCDT agent learns about influences over agent, but forget its own ability to do that when deciding.
These examples seem related to the abstraction question: we want the model to know that it is splitting an agent into parts, and still believe it can’t influence the agent as a hole. If we could realize this, then the LCDT agent wouldn’t believe it could influence the neural net/the logic gates.
For a start, low-level deterministic reasoning:
”Obviously I could never influence an agent, but I found some inputs to deterministic biological neural nets that would make things I want happen.”
″Obviously I could never influence my future self, but if I change a few logic gates in this processor, it would make things I want happen.”
Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):
”This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y).”
Cf. smart humans in Newcomb’s prob: “This is really weird but if I one box I get the million, if I two-box I don’t, so I guess I’ll just one box.”
Yeah, I think this assumes an imperfect implementation. This relation can definitely be learned by the causal model (and is probably learned before the first real decision), but when the decision happen, it is cut. So it’s like a true LCDT agent learns about influences over agent, but forget its own ability to do that when deciding.
These examples seem related to the abstraction question: we want the model to know that it is splitting an agent into parts, and still believe it can’t influence the agent as a hole. If we could realize this, then the LCDT agent wouldn’t believe it could influence the neural net/the logic gates.