For a long time, the way ANNs work kinda made sense to me, and seemed to map nicely onto my (shallow) understanding of how human brain works. But I could never imagine how could the values/drives/desires be implemented in terms of ANN.
The idea that you can just quantify something you want as a metric, feed it as an input, and see if the output is closer to what we want is new to me. It was a little epiphany, that seems to make sense, so it prompted me to write this post.
Evolutionary, I guess human/animal utility function would be something like “How many copies of myself have I made? Let’s maximize that.” But from the subjective perspective, it’s probably more like “Am I receiving the pleasure from the reward system my brain happened to develop?”
For sure there are a bunch of different impulses/drives, but they all are just little rewards for transforming the current state of the world into the one our brain prefers, right? Maybe they have appeared randomly, but if you were to design one intentionally, is that how you would go about it?
Learning
Get inputs from eyes/ears.
Recognize patterns, make predictions.
Compare predictions to how things turned out, update the beliefs, improve the model of the world.
Repeat.
General intelligence taking actions towards it’s values
Perceive the difference between the state of world, and the state I want.
Use the model of the world that I’ve learned to predict the outcomes of possible actions.
If I predict that applying action to the world will lead to rewards—take action.
See how it turned out, update the model, repeat.
I agree that specific goals can also have unintended consequences. It just occurred to me that this kind of problem would be much easier to solve than trying to align the abstract values, and the outcome is the same—we get what we want.
Oh, and I totally agree that there’s probably a ton of complexity when it comes to the implementation. But it would be pretty cool to figure out at least the general idea of what intelligence and consciousness are, what things we need to implement, and how they fit together.
In real life, the problem with metrics is that if you don’t make it perfectly right (which is difficult), you can easily get something useless, often even actively harmful.
And yet, metrics often are useful in real life. You generally want to measure things. You need to know how much money you have, and it is better to know in detail the structure of your incomes and expenses. If you want to e.g. exercise regularly or stop eating chocolate, keeping a log of which days you exercised or avoided the chocolate is often a good first step.
Thus we find ourselves in a paradox that we need good metrics, but we need to remember that they are mere approximations of reality, lest we start optimizing for the metrics at the expense of the real things. (Good advice for a human, not very useful for constructing the AI.)
Evolutionary, I guess human/animal utility function would be something like “How many copies of myself have I made? Let’s maximize that.” But from the subjective perspective, it’s probably more like “Am I receiving the pleasure from the reward system my brain happened to develop?”
Yes, the “utility” of evolution is not the same as that of the evolved human.
For sure there are a bunch of different impulses/drives, but they all are just little rewards for transforming the current state of the world into the one our brain prefers, right?
Sometimes following your impulse can make you unhappy and still on average increase your fitness, for example jealousy. (Jealous people are made less happy by the idea that their partners might be cheating on them. But feeling this discomfort and guarding one’s partner increases the reproductive fitness in average.) I mean, yes, finding out that despite your suspicions your partner does not cheat on you makes you more happy (or less unhappy) than finding out that they actually do. But not worrying about the possibility would make you even more happy. Humans are instinctively not even happiness maximizers.
Thank you for your reply!
For a long time, the way ANNs work kinda made sense to me, and seemed to map nicely onto my (shallow) understanding of how human brain works. But I could never imagine how could the values/drives/desires be implemented in terms of ANN.
The idea that you can just quantify something you want as a metric, feed it as an input, and see if the output is closer to what we want is new to me. It was a little epiphany, that seems to make sense, so it prompted me to write this post.
Evolutionary, I guess human/animal utility function would be something like “How many copies of myself have I made? Let’s maximize that.” But from the subjective perspective, it’s probably more like “Am I receiving the pleasure from the reward system my brain happened to develop?”
For sure there are a bunch of different impulses/drives, but they all are just little rewards for transforming the current state of the world into the one our brain prefers, right? Maybe they have appeared randomly, but if you were to design one intentionally, is that how you would go about it?
Learning
Get inputs from eyes/ears.
Recognize patterns, make predictions.
Compare predictions to how things turned out, update the beliefs, improve the model of the world.
Repeat.
General intelligence taking actions towards it’s values
Perceive the difference between the state of world, and the state I want.
Use the model of the world that I’ve learned to predict the outcomes of possible actions.
If I predict that applying action to the world will lead to rewards—take action.
See how it turned out, update the model, repeat.
I agree that specific goals can also have unintended consequences. It just occurred to me that this kind of problem would be much easier to solve than trying to align the abstract values, and the outcome is the same—we get what we want.
Oh, and I totally agree that there’s probably a ton of complexity when it comes to the implementation. But it would be pretty cool to figure out at least the general idea of what intelligence and consciousness are, what things we need to implement, and how they fit together.
In real life, the problem with metrics is that if you don’t make it perfectly right (which is difficult), you can easily get something useless, often even actively harmful.
And yet, metrics often are useful in real life. You generally want to measure things. You need to know how much money you have, and it is better to know in detail the structure of your incomes and expenses. If you want to e.g. exercise regularly or stop eating chocolate, keeping a log of which days you exercised or avoided the chocolate is often a good first step.
Thus we find ourselves in a paradox that we need good metrics, but we need to remember that they are mere approximations of reality, lest we start optimizing for the metrics at the expense of the real things. (Good advice for a human, not very useful for constructing the AI.)
Yes, the “utility” of evolution is not the same as that of the evolved human.
Sometimes following your impulse can make you unhappy and still on average increase your fitness, for example jealousy. (Jealous people are made less happy by the idea that their partners might be cheating on them. But feeling this discomfort and guarding one’s partner increases the reproductive fitness in average.) I mean, yes, finding out that despite your suspicions your partner does not cheat on you makes you more happy (or less unhappy) than finding out that they actually do. But not worrying about the possibility would make you even more happy. Humans are instinctively not even happiness maximizers.