Transcript for searchability:
hi this video is kind of a response tovarious comments that I’ve got over theyears ever since that video on computerfile where I was describing the sort ofproblems that we might have when we havea powerful artificial generalintelligence with goals which aren’t thesame as our goals even if those goalsseem pretty benign we use this thoughtexperiment of an extremely powerful AGIworking to optimize the simple goal ofcollecting stamps and some of theproblems that that might cause I gotsome comments from people saying thatthey think the stamp collecting deviceis stupid and not that it’s a stupidthought experiment but the device itselfis actually stupid they said unless ithas complex goals or the ability tochoose its own goals then it didn’tcount as being highly intelligent inother videos I got comments saying ittakes intelligence to do moral reasoningso an intelligent AGI system should beable to do that and a super intelligenceshould be able to do it better thanhumans in fact if a super intelligencedecides that the right thing to do is tokill us all then I guess that’s theright thing to do these comments are allkind of suffering from the same mistakewhich is what this video is about butbefore I get to that I need to lay somegroundwork first if you like Occam’srazor then you’ll love Humes guillotinealso called the is odd problem this is apretty simple concept that I’d like tobe better known the idea is statementscan be divided up into two types isstatements and Hort statements thesestatements or positive statements arestatements about how the world is howthe world was in the past how the worldwill be in the future or how the worldwould be in hypothetical situations thisis facts about the nature of reality thecausal relationships between things thatkind of thing then you have the oughtstatements the should statements thenormative statements these are about theway the world should be the way we wantthe world to be statements about ourgoals our values ethics morals what wewant all of that stuff now you canderive logical statements from oneanother like it’s snowing outsidethat’s a nice statement it’s cold whenit snows another s statement and thenyou can deduce therefore it’s coldoutsidethat’s another is statement it’s ourconclusion this is all pretty obviousbut you might say something like it’ssnowing outside therefore you ought toput on a coat and that’s a very normalsort of sentence that people might saybut as a logical statement it actuallyrelies on some hidden assumptionwithout assuming some kind of oughtstatement you can’t derive another oughtstatement this is the core of the Azureproblem you can never derive an oughtstatement using only is statements youought to put on a coat why because it’ssnowing outside so what is the fact thatit’s snowing mean I should put on thecoat well the fact that it’s snowingmeans that it’s cold and why should itbeing cold mean I should put on a coatif it’s cold and you go outside withouta coat you’ll be cold should I not becold well if you get too cold you’llfreeze to death okay you’re saying Ishouldn’t freeze to deaththat was kind of silly but you see whatI’m saying you can keep laying out isstatements for as long as you want youwill never be able to derive that youought to put on a coat at some point inorder to derive that ought statement youneed to assume at least one other oughtstatement if you have some kind of oughtstatement like I ought to continue to bealive you can then say given that Iought to keep living and then if I gooutside without a coat I’ll die then Iought to put on a coat but unless youhave at least one ought statement youcannot derive any other ought statementsstatementsand Hort statements are separated byHume skia T okay so people are sayingthat a device that single-mindedlycollects stamps at the cost ofeverything else is stupid and doesn’tcount as a powerful intelligence solet’s define our terms what isintelligence and conversely what isstupidity I feel like I made fairlyclear in those videos what I meant byintelligence we’re talking about a GIsystems as intelligent agents they’reentities that take actions in the worldin order to achieve their goals ormaximize their utility functionsintelligence is the thing that allowsthem to choose good actions to chooseactions that will get them what theywant an agent’s level of intelligencereally means its level of effectivenessof pursuing its goals in practice thisis likely to involve having or buildingan accurate model of reality keepingthat model up-to-date by reasoning aboutobservations and using the model to makepredictions about the future and thelikely consequences of differentpossible actions to figure out whichactions will result in which outcomesintelligence involves answeringquestions like what is the world likehow does it work what will happen nextwhat would happen in this scenario orthat scenario what would happen if Itook this action or that action moreintelligent systems are in some sensebetter at answering these kinds ofquestions which allows them to be betterat choosing actions but one thing youmight notice about these questions isthey’re all ears questions the systemhas goals which can be thought of asHort statements but the level ofintelligence depends only on the abilityto reason about is questions in order toanswer the single ort question whataction should I take next so given thatthat’s what we mean by intelligence whatdoes it mean to be stupid well firstlyyou can be stupid in terms of thosequestions for example by building amodel that doesn’t correspond withreality or by failing to update yourmodel properly with new evidence if Ilook out of my windowand I see there’s snow everywhere youknow I see a snowman and I think tomyself oh what a beautiful warm sunnyday then that’s stupid right my beliefis wrong and I had all the clues torealize it’s cold outside so beliefs canbe stupid by not corresponding torealitywhat about actions like if I go outsidein the snow without my coat that’sstupid right well it might be if I thinkit’s sunny and warm and I go outside tosunbathe then yeah that’s stupid but ifI just came out of a sauna or somethingand I’m too hot and I want to coolmyself down then going outside without acoat might be quite sensible you can’tknow if an action is stupid just bylooking at its consequences you have toalso know the goals of the agent takingthe action you can’t just use isstatements you need a naught so actionsare only stupid relative to a particulargoal it doesn’t feel that way thoughpeople often talk about actions beingstupid without specifying what goalsthey’re stupid relative to but in thosecases the goals are implied we’re humansand when we say that an action is stupidin normal human communication we’remaking some assumptions about normalhuman goals and because we’re alwaystalking about people and people tend towant similar things it’s sort of ashorthand that we can skip what goalswere talking about so what about thegoals then can goals be stupidwell this depends on the differencebetween instrumental goals and terminalgoalsthis is something I’ve covered elsewherebut your terminal goals are the thingsthat you want just because you want themyou don’t have a particular reason towant them they’re just what you want theinstrumental goals are the goals youwant because they’ll get you closer toyour terminal goals like if I have aterminal goal to visit a town that’s faraway maybe an instrumental goal would beto find a train station I don’t want tofind a train station just because trainsare cool I want to find a train as ameans to an end it’s going to take me tothis townso that makes it an instrumental goalnow an instrumental goal can be stupidif I want to go to this distant town soI decide I want to find a pogo stickthat’s pretty stupidfinding a pogo stick is a stupidinstrumental goal if my terminal goal isto get to a faraway place but if we’reterminal go with something else likehaving fun it might not be stupid so inthat way it’s like actions instrumentalgoals can only be stupid relative toterminal goals so you see how this worksbeliefs and predictions can be stupidrelative to evidence or relative toreality actions can be stupid relativeto goals of any kindinstrumental goals can be stupidrelative to terminal goals but here’sthe big point terminal goals can’t bestupid there’s nothing to judge themagainst if a terminal goal seems stupidlike let’s say collecting stamps seemslike a stupid terminal goal that’sbecause it would be stupid as aninstrumental goal to human terminalgoals but the stamp collector does nothave human terminal goalssimilarly the things that humans careabout would seem stupid to the stampcollector because they result in so fewstamps so let’s get back to thosecomments one type of comments says thisbehavior of just single mindedly goingafter one thing and ignoring everythingelse and ignoring the totally obviousfact that stamps aren’t that importantis really stupid behavior you’re callingthis thing of super intelligence but itdoesn’t seem super intelligent to me itjust seems kind of like an idiothopefully the answer to this is nowclear the stamp collectors actions arestupid relative to human goals but itdoesn’t have human goals itsintelligence comes not from its goalsbut from its ability to understand andreason about the world allowing it tochoose actions that achieve its goalsand this is true whatever those goalsactually are some people commented alongthe lines of well okay yeah sure you’vedefined intelligence to only includethis type of is statement kind ofreasoning but I don’t like thatdefinition I think to be trulyintelligent you need to have complexgoals something with simple goalsdoesn’t count as intelligent to that Isay well you can use words however youwant I guess I’m using intelligence hereas a technical term in the way that it’soften used in the field you’re free tohave your own definition of the word butthe fact that something fails to meetyour definition of intelligence does notmean that it will fail to behave in away that most people would callintelligentif the stamp collector outwits you getsaround everything you’ve put in its wayand outmaneuvers you mentally it comesup with new strategies that you wouldnever have thought of to stop you fromturning it off and stopping frompreventing it from making stamps and asa consequence it turns the entire worldinto stamps in various ways you couldnever think of it’s totally okay for youto say that it doesn’t count asintelligent if you want but you’re stilldead I prefer my definition because itbetter captures the ability to getthings done in the world which is thereason that we actually care about AGIin the first placesimilarly people who say that in orderto be intelligent you need to be able tochoose your own goalsI would agree you need to be able tochoose your own instrumental goals butnot your own terminal goals changingyour terminal goals is like willinglytaking a pill that will make you want tomurder your children it’s something youpretty much never want to do apart fromsome bizarre edge cases if yourationally want to take an action thatchanges one of your goals then thatwasn’t a terminal goal now moving on tothese comments saying an AGI will beable to reason about morality and ifit’s really smarter than us it willactually do moral reasoning better thanusso there’s nothing to worry about it’strue that a superior intelligence mightbe better at moral reasoning than us butultimately moral behavior depends not onmoral reasoning but on having the rightterminal goals there’s a differencebetween figuring out and understandinghuman morality and actually wanting toact according to it the stamp collectingdevice has a perfect understanding ofhuman goals ethics and values and ituses that only to manipulate people forstamps it’s super human moral reasoningdoesn’t make its actions good if wecreate a super intelligence and itdecides to kill us that doesn’t tell usanything about morality it just means wescrewed upso what mistake do all of these commentshave in common the orthogonality thesisin AI safety is that more or less anygoal is compatible with more or less anylevel of intelligence ie thoseproperties are orthogonal you can placethem on these two axes and it’s possibleto have agents anywhere in this spaceanywhere on either scale you can havevery weak low intelligence agents thathave complex human compatible goals youcan have powerful highly intelligentsystems with complex sophisticated goalsyou can have weak simple agents withsilly goals and yescan have powerful highly intelligentsystems with simple weird inhuman goalsany of these are possible because levelof intelligence is about effectivenessat answering is questions and goals areall about what questions and the twosides are separated by Humes guillotinehopefully looking at what we’ve talkedabout so far it should be pretty obviousthat this is the case like what would iteven mean for it to be false but for itto be impossible to create powerfulintelligences with certain goals thestamp collector is intelligent becauseit’s effective at considering theconsequences of sending differentcombinations of packets on the internetand calculating how many stamps thatresults in exactly how good do you haveto be at that before you don’t careabout stamps anymore and you randomlystart to care about some other thingthat was never part of your terminalgoals like feeding the hungry orwhatever it’s just not gonna happen sothat’s the orthogonality thesis it’spossible to create a powerfulintelligence that will pursue any goalyou can specify knowing an agent’sterminal goals doesn’t really tell youanything about its level of intelligenceand knowing an agent’s level ofintelligence doesn’t tell you anythingabout its goals[Music]I want to end the video by saying thankyou to my excellent patrons so it’s allof these people here thank you so muchfor your supportlets me do stuff like building thislight boy thank you for sticking with methrough that weird patreon fees thingand my moving to a different city whichhas really got in the way of makingvideos recently but I’m back on it nownew video every two weeks is the partanyway in this video I’m especiallyFranklin Katie Beirne who’s supportedthe channel for a long time she actuallyhas her own YouTube channel about 3dmodeling and stuff so a link to that andwhile I’m at it when I think Chad Jonesages ago I didn’t mention his YouTubechannel so link to both of those in thedescription thanks again and I’ll seeyou next time I don’t speak cat whatdoes that mean
hi this video is kind of a response to
various comments that I’ve got over the
years ever since that video on computer
file where I was describing the sort of
problems that we might have when we have
a powerful artificial general
intelligence with goals which aren’t the
same as our goals even if those goals
seem pretty benign we use this thought
experiment of an extremely powerful AGI
working to optimize the simple goal of
collecting stamps and some of the
problems that that might cause I got
some comments from people saying that
they think the stamp collecting device
is stupid and not that it’s a stupid
thought experiment but the device itself
is actually stupid they said unless it
has complex goals or the ability to
choose its own goals then it didn’t
count as being highly intelligent in
other videos I got comments saying it
takes intelligence to do moral reasoning
so an intelligent AGI system should be
able to do that and a super intelligence
should be able to do it better than
humans in fact if a super intelligence
decides that the right thing to do is to
kill us all then I guess that’s the
right thing to do these comments are all
kind of suffering from the same mistake
which is what this video is about but
before I get to that I need to lay some
groundwork first if you like Occam’s
razor then you’ll love Humes guillotine
also called the is odd problem this is a
pretty simple concept that I’d like to
be better known the idea is statements
can be divided up into two types is
statements and Hort statements these
statements or positive statements are
statements about how the world is how
the world was in the past how the world
will be in the future or how the world
would be in hypothetical situations this
is facts about the nature of reality the
causal relationships between things that
kind of thing then you have the ought
statements the should statements the
normative statements these are about the
way the world should be the way we want
the world to be statements about our
goals our values ethics morals what we
want all of that stuff now you can
derive logical statements from one
another like it’s snowing outside
that’s a nice statement it’s cold when
it snows another s statement and then
you can deduce therefore it’s cold
outside
that’s another is statement it’s our
conclusion this is all pretty obvious
but you might say something like it’s
snowing outside therefore you ought to
put on a coat and that’s a very normal
sort of sentence that people might say
but as a logical statement it actually
relies on some hidden assumption
without assuming some kind of ought
statement you can’t derive another ought
statement this is the core of the Azure
problem you can never derive an ought
statement using only is statements you
ought to put on a coat why because it’s
snowing outside so what is the fact that
it’s snowing mean I should put on the
coat well the fact that it’s snowing
means that it’s cold and why should it
being cold mean I should put on a coat
if it’s cold and you go outside without
a coat you’ll be cold should I not be
cold well if you get too cold you’ll
freeze to death okay you’re saying I
shouldn’t freeze to death
that was kind of silly but you see what
I’m saying you can keep laying out is
statements for as long as you want you
will never be able to derive that you
ought to put on a coat at some point in
order to derive that ought statement you
need to assume at least one other ought
statement if you have some kind of ought
statement like I ought to continue to be
alive you can then say given that I
ought to keep living and then if I go
outside without a coat I’ll die then I
ought to put on a coat but unless you
have at least one ought statement you
cannot derive any other ought statements
statements
and Hort statements are separated by
Hume skia T okay so people are saying
that a device that single-mindedly
collects stamps at the cost of
everything else is stupid and doesn’t
count as a powerful intelligence so
let’s define our terms what is
intelligence and conversely what is
stupidity I feel like I made fairly
clear in those videos what I meant by
intelligence we’re talking about a GI
systems as intelligent agents they’re
entities that take actions in the world
in order to achieve their goals or
maximize their utility functions
intelligence is the thing that allows
them to choose good actions to choose
actions that will get them what they
want an agent’s level of intelligence
really means its level of effectiveness
of pursuing its goals in practice this
is likely to involve having or building
an accurate model of reality keeping
that model up-to-date by reasoning about
observations and using the model to make
predictions about the future and the
likely consequences of different
possible actions to figure out which
actions will result in which outcomes
intelligence involves answering
questions like what is the world like
how does it work what will happen next
what would happen in this scenario or
that scenario what would happen if I
took this action or that action more
intelligent systems are in some sense
better at answering these kinds of
questions which allows them to be better
at choosing actions but one thing you
might notice about these questions is
they’re all ears questions the system
has goals which can be thought of as
Hort statements but the level of
intelligence depends only on the ability
to reason about is questions in order to
answer the single ort question what
action should I take next so given that
that’s what we mean by intelligence what
does it mean to be stupid well firstly
you can be stupid in terms of those
questions for example by building a
model that doesn’t correspond with
reality or by failing to update your
model properly with new evidence if I
look out of my window
and I see there’s snow everywhere you
know I see a snowman and I think to
myself oh what a beautiful warm sunny
day then that’s stupid right my belief
is wrong and I had all the clues to
realize it’s cold outside so beliefs can
be stupid by not corresponding to
reality
what about actions like if I go outside
in the snow without my coat that’s
stupid right well it might be if I think
it’s sunny and warm and I go outside to
sunbathe then yeah that’s stupid but if
I just came out of a sauna or something
and I’m too hot and I want to cool
myself down then going outside without a
coat might be quite sensible you can’t
know if an action is stupid just by
looking at its consequences you have to
also know the goals of the agent taking
the action you can’t just use is
statements you need a naught so actions
are only stupid relative to a particular
goal it doesn’t feel that way though
people often talk about actions being
stupid without specifying what goals
they’re stupid relative to but in those
cases the goals are implied we’re humans
and when we say that an action is stupid
in normal human communication we’re
making some assumptions about normal
human goals and because we’re always
talking about people and people tend to
want similar things it’s sort of a
shorthand that we can skip what goals
were talking about so what about the
goals then can goals be stupid
well this depends on the difference
between instrumental goals and terminal
goals
this is something I’ve covered elsewhere
but your terminal goals are the things
that you want just because you want them
you don’t have a particular reason to
want them they’re just what you want the
instrumental goals are the goals you
want because they’ll get you closer to
your terminal goals like if I have a
terminal goal to visit a town that’s far
away maybe an instrumental goal would be
to find a train station I don’t want to
find a train station just because trains
are cool I want to find a train as a
means to an end it’s going to take me to
this town
so that makes it an instrumental goal
now an instrumental goal can be stupid
if I want to go to this distant town so
I decide I want to find a pogo stick
that’s pretty stupid
finding a pogo stick is a stupid
instrumental goal if my terminal goal is
to get to a faraway place but if we’re
terminal go with something else like
having fun it might not be stupid so in
that way it’s like actions instrumental
goals can only be stupid relative to
terminal goals so you see how this works
beliefs and predictions can be stupid
relative to evidence or relative to
reality actions can be stupid relative
to goals of any kind
instrumental goals can be stupid
relative to terminal goals but here’s
the big point terminal goals can’t be
stupid there’s nothing to judge them
against if a terminal goal seems stupid
like let’s say collecting stamps seems
like a stupid terminal goal that’s
because it would be stupid as an
instrumental goal to human terminal
goals but the stamp collector does not
have human terminal goals
similarly the things that humans care
about would seem stupid to the stamp
collector because they result in so few
stamps so let’s get back to those
comments one type of comments says this
behavior of just single mindedly going
after one thing and ignoring everything
else and ignoring the totally obvious
fact that stamps aren’t that important
is really stupid behavior you’re calling
this thing of super intelligence but it
doesn’t seem super intelligent to me it
just seems kind of like an idiot
hopefully the answer to this is now
clear the stamp collectors actions are
stupid relative to human goals but it
doesn’t have human goals its
intelligence comes not from its goals
but from its ability to understand and
reason about the world allowing it to
choose actions that achieve its goals
and this is true whatever those goals
actually are some people commented along
the lines of well okay yeah sure you’ve
defined intelligence to only include
this type of is statement kind of
reasoning but I don’t like that
definition I think to be truly
intelligent you need to have complex
goals something with simple goals
doesn’t count as intelligent to that I
say well you can use words however you
want I guess I’m using intelligence here
as a technical term in the way that it’s
often used in the field you’re free to
have your own definition of the word but
the fact that something fails to meet
your definition of intelligence does not
mean that it will fail to behave in a
way that most people would call
intelligent
if the stamp collector outwits you gets
around everything you’ve put in its way
and outmaneuvers you mentally it comes
up with new strategies that you would
never have thought of to stop you from
turning it off and stopping from
preventing it from making stamps and as
a consequence it turns the entire world
into stamps in various ways you could
never think of it’s totally okay for you
to say that it doesn’t count as
intelligent if you want but you’re still
dead I prefer my definition because it
better captures the ability to get
things done in the world which is the
reason that we actually care about AGI
in the first place
similarly people who say that in order
to be intelligent you need to be able to
choose your own goals
I would agree you need to be able to
choose your own instrumental goals but
not your own terminal goals changing
your terminal goals is like willingly
taking a pill that will make you want to
murder your children it’s something you
pretty much never want to do apart from
some bizarre edge cases if you
rationally want to take an action that
changes one of your goals then that
wasn’t a terminal goal now moving on to
these comments saying an AGI will be
able to reason about morality and if
it’s really smarter than us it will
actually do moral reasoning better than
us
so there’s nothing to worry about it’s
true that a superior intelligence might
be better at moral reasoning than us but
ultimately moral behavior depends not on
moral reasoning but on having the right
terminal goals there’s a difference
between figuring out and understanding
human morality and actually wanting to
act according to it the stamp collecting
device has a perfect understanding of
human goals ethics and values and it
uses that only to manipulate people for
stamps it’s super human moral reasoning
doesn’t make its actions good if we
create a super intelligence and it
decides to kill us that doesn’t tell us
anything about morality it just means we
screwed up
so what mistake do all of these comments
have in common the orthogonality thesis
in AI safety is that more or less any
goal is compatible with more or less any
level of intelligence ie those
properties are orthogonal you can place
them on these two axes and it’s possible
to have agents anywhere in this space
anywhere on either scale you can have
very weak low intelligence agents that
have complex human compatible goals you
can have powerful highly intelligent
systems with complex sophisticated goals
you can have weak simple agents with
silly goals and yes
systems with simple weird inhuman goals
any of these are possible because level
of intelligence is about effectiveness
at answering is questions and goals are
all about what questions and the two
sides are separated by Humes guillotine
hopefully looking at what we’ve talked
about so far it should be pretty obvious
that this is the case like what would it
even mean for it to be false but for it
to be impossible to create powerful
intelligences with certain goals the
stamp collector is intelligent because
it’s effective at considering the
consequences of sending different
combinations of packets on the internet
and calculating how many stamps that
results in exactly how good do you have
to be at that before you don’t care
about stamps anymore and you randomly
start to care about some other thing
that was never part of your terminal
goals like feeding the hungry or
whatever it’s just not gonna happen so
that’s the orthogonality thesis it’s
possible to create a powerful
intelligence that will pursue any goal
you can specify knowing an agent’s
terminal goals doesn’t really tell you
anything about its level of intelligence
and knowing an agent’s level of
intelligence doesn’t tell you anything
about its goals
[Music]
I want to end the video by saying thank
you to my excellent patrons so it’s all
of these people here thank you so much
for your support
lets me do stuff like building this
light boy thank you for sticking with me
through that weird patreon fees thing
and my moving to a different city which
has really got in the way of making
videos recently but I’m back on it now
new video every two weeks is the part
anyway in this video I’m especially
Franklin Katie Beirne who’s supported
the channel for a long time she actually
has her own YouTube channel about 3d
modeling and stuff so a link to that and
while I’m at it when I think Chad Jones
ages ago I didn’t mention his YouTube
channel so link to both of those in the
description thanks again and I’ll see
you next time I don’t speak cat what
does that mean
Transcript for searchability: