Here’s a selection of notes I wrote while reading this (in some cases substantially expanded with explanation).
The reason any kind of ‘goal-directedness’ is incentivised in AI systems is that then the system can be given an objective by someone hoping to use their cognitive labor, and the system will make that objective happen. Whereas a similar non-agentic AI system might still do almost the same cognitive labor, but require an agent (such as a person) to look at the objective and decide what should be done to achieve it, then ask the system for that. Goal-directedness means automating this high-level strategizing.
This doesn’t seem quite right to me, at least not as I understand the claim. A system that can search through a larger space of actions will be more capable than one that is restricted to a smaller space, but it will require more goal-like training and instructions. Narrower instructions will restrict its search and, in expectation, result in worse performance. For example, if a child wanted cake, they might try to dictate actions to me that would lead to me baking a cake for them. But if they gave me the goal of giving them a cake, I’d find a good recipe or figure out where I can buy a cake for them and the result would be much better. Automating high-level strategizing doesn’t just relieve you of the burden of doing it yourself, it allows an agent to find superior strategies to those you could come up with.
Skipping the nose is the kind of mistake you make if you are a child drawing a face from memory. Skipping ‘boredom’ is the kind of mistake you make if you are a person trying to write down human values from memory. My guess is that this seemed closer to the plan in 2009 when that post was written, and that people cached the takeaway and haven’t updated it for deep learning which can learn what faces look like better than you can.
(I haven’t waded through the entire thread on the faces thing, so maybe this was mentioned already.) It seems to me that it’s a lot easier to point to examples of faces that an AI can learn from than examples of human values that an AI can learn from.
It also seems plausible that [the AIs under discussion] would be owned and run by humans. This would seem to not involve any transfer of power to that AI system, except insofar as its intellectual outputs benefit it
I think this is a good point, but isn’t this what the principal-agent problem is all about? And isn’t that a real problem in the real world?
That is, tasks might lack headroom not because they are simple, but because they are complex. E.g. AI probably can’t predict the weather much further out than humans.
They might be able to if they can control the weather!
IQ 130 humans apparently earn very roughly $6000-$18,500 per year more than average IQ humans.
I left a note to myself to compare this to disposable income. The US median household disposable income (according to the OECD, includes transfers, taxes, payments for health insurance, etc) is about $45k/year. At the time, my thought was “okay, but that’s maybe pretty substantial, compared to the typical amount of money a person can realistically use to shape the world to their liking”. I’m not sure this is very informative, though.
Often at least, the difference in performance between mediocre human performance and top level human performance is large, relative to the space below, iirc.
I take machine chess performance as evidence for a not-so-small range of human ability, especially when compared to rate of increase of machine ability. But I think it’s good to be cautious about using chess Elo as a measure of the human range of ability, in any absolute sense, because chess is popular in part because it is so good at separating humans by skill. It could be the case that humans occupy a fairly small slice of chess ability (measured by, I dunno, likelihood of choosing the optimal move or some other measure of performance that isn’t based on success rate against other players), but a small increase in skill confers a large increase in likelihood of winning, at skill levels achievable by humans.
~Goal-directed entities may tend to arise from machine learning training processes not intending to create them (at least via the methods that are likely to be used).~
I made my notes on the AI Impacts version, which was somewhat different, but it’s not clear to me that this should be crossed out. It seems to me that institutions do exhibit goal-like behavior that is not intended by the people who created them.
Here’s a selection of notes I wrote while reading this (in some cases substantially expanded with explanation).
This doesn’t seem quite right to me, at least not as I understand the claim. A system that can search through a larger space of actions will be more capable than one that is restricted to a smaller space, but it will require more goal-like training and instructions. Narrower instructions will restrict its search and, in expectation, result in worse performance. For example, if a child wanted cake, they might try to dictate actions to me that would lead to me baking a cake for them. But if they gave me the goal of giving them a cake, I’d find a good recipe or figure out where I can buy a cake for them and the result would be much better. Automating high-level strategizing doesn’t just relieve you of the burden of doing it yourself, it allows an agent to find superior strategies to those you could come up with.
(I haven’t waded through the entire thread on the faces thing, so maybe this was mentioned already.) It seems to me that it’s a lot easier to point to examples of faces that an AI can learn from than examples of human values that an AI can learn from.
I think this is a good point, but isn’t this what the principal-agent problem is all about? And isn’t that a real problem in the real world?
They might be able to if they can control the weather!
I left a note to myself to compare this to disposable income. The US median household disposable income (according to the OECD, includes transfers, taxes, payments for health insurance, etc) is about $45k/year. At the time, my thought was “okay, but that’s maybe pretty substantial, compared to the typical amount of money a person can realistically use to shape the world to their liking”. I’m not sure this is very informative, though.
I take machine chess performance as evidence for a not-so-small range of human ability, especially when compared to rate of increase of machine ability. But I think it’s good to be cautious about using chess Elo as a measure of the human range of ability, in any absolute sense, because chess is popular in part because it is so good at separating humans by skill. It could be the case that humans occupy a fairly small slice of chess ability (measured by, I dunno, likelihood of choosing the optimal move or some other measure of performance that isn’t based on success rate against other players), but a small increase in skill confers a large increase in likelihood of winning, at skill levels achievable by humans.
I made my notes on the AI Impacts version, which was somewhat different, but it’s not clear to me that this should be crossed out. It seems to me that institutions do exhibit goal-like behavior that is not intended by the people who created them.