I am aware that humans hav a non zero level of life threatening behaviour. If we wanted it to be lower, we could make it lower, at the expense of various costs. We don’t which seems to mean we are happy with the current cost benefit ratio. Arguing, as you have, that the risk of AI self harm can’t be reduced to zero doesn’t mean we can’t hit an actuarial optimum.
It is not clear to me why you think safety training would limit intelligence.
You’re cutting up behaviors into two categories, ‘safe conduct’ and ‘unsafe conduct’. I’m making a finer cut, one that identifies systematic reasons some kinds of safe or unsafe behavior occur.
If you aren’t seeing why it’s useful to distinguish ‘I dropped an anvil on my head because I’m a Cartesian’ from ‘I dropped an anvil on my head because I’m a newborn who’s never seen dangerous things happen to anyone’, consider the more general dichotomy: ‘Errors due to biases in prior probability distribution’ v. ‘Errors due to small or biased data sets’.
AIMU is the agent designed to make no mistakes of the latter kind; AIXI is not such an agent, and is only intended to avoid mistakes of the former kind. AIXI is supposed to be a universal standard for induction because it only gets things wrong to the extent its data fails it, not to the extent it started off with a-priori wrong assumptions. My claim is that for a physically implemented AIXI-style agent, AIXI fails in its prior, not just in its lack of data-omniscience.
‘You aren’t omniscient about the data’ is a trivial critique, because we could never build something physically omniscient. (‘You aren’t drawing conclusions from the data in a computationally efficient manner’ is a more serious critique, but one I’m bracketing for present purposes, because AIXI isn’t intended to be computationally efficient.) Instead, my main critique is of AIXI’s Solomonoff prior. (A subsidiary critique of mine is directed at reinforcement learning, but I won’t write more about that in this epistemological setting.)
In sum: We should be interested in why the AI is making its mistakes, not just in its aggregate error rate. When we become interested in that, we notice that AIXI makes some mistakes because it’s biased, not just because it’s ignorant. That matters because (a) we could never fully solve the problem of ignorance, but we might be able to fully solve the problem of bias; and (b) if we build a sufficiently smart self-modifier it should be able to make headway on ignorance itself, whereas it won’t necessarily make headway on fixing its own biases. Problems with delusive hypothesis spaces and skewed priors are worrisome even when they only occasionally lead to mistakes, because they’re the sorts of problems that can be permanent, problems agents suffering from them may not readily converge on solutions to.
I am aware that humans hav a non zero level of life threatening behaviour. If we wanted it to be lower, we could make it lower, at the expense of various costs. We don’t which seems to mean we are happy with the current cost benefit ratio. Arguing, as you have, that the risk of AI self harm can’t be reduced to zero doesn’t mean we can’t hit an actuarial optimum.
It is not clear to me why you think safety training would limit intelligence.
You’re cutting up behaviors into two categories, ‘safe conduct’ and ‘unsafe conduct’. I’m making a finer cut, one that identifies systematic reasons some kinds of safe or unsafe behavior occur.
If you aren’t seeing why it’s useful to distinguish ‘I dropped an anvil on my head because I’m a Cartesian’ from ‘I dropped an anvil on my head because I’m a newborn who’s never seen dangerous things happen to anyone’, consider the more general dichotomy: ‘Errors due to biases in prior probability distribution’ v. ‘Errors due to small or biased data sets’.
AIMU is the agent designed to make no mistakes of the latter kind; AIXI is not such an agent, and is only intended to avoid mistakes of the former kind. AIXI is supposed to be a universal standard for induction because it only gets things wrong to the extent its data fails it, not to the extent it started off with a-priori wrong assumptions. My claim is that for a physically implemented AIXI-style agent, AIXI fails in its prior, not just in its lack of data-omniscience.
‘You aren’t omniscient about the data’ is a trivial critique, because we could never build something physically omniscient. (‘You aren’t drawing conclusions from the data in a computationally efficient manner’ is a more serious critique, but one I’m bracketing for present purposes, because AIXI isn’t intended to be computationally efficient.) Instead, my main critique is of AIXI’s Solomonoff prior. (A subsidiary critique of mine is directed at reinforcement learning, but I won’t write more about that in this epistemological setting.)
In sum: We should be interested in why the AI is making its mistakes, not just in its aggregate error rate. When we become interested in that, we notice that AIXI makes some mistakes because it’s biased, not just because it’s ignorant. That matters because (a) we could never fully solve the problem of ignorance, but we might be able to fully solve the problem of bias; and (b) if we build a sufficiently smart self-modifier it should be able to make headway on ignorance itself, whereas it won’t necessarily make headway on fixing its own biases. Problems with delusive hypothesis spaces and skewed priors are worrisome even when they only occasionally lead to mistakes, because they’re the sorts of problems that can be permanent, problems agents suffering from them may not readily converge on solutions to.