I’m not really sold on the fundamentalness, especially of positive bias. Even if we grant that we’re just considering systems that think complicated thoughts / plans in serial steps by evaluating intermediate simple thoughts, the fundamentalness argument only makes sense if the thought-assessment is hard to adjust based on context or can’t have recursive structure.
Like, humans can’t perfectly simulate hypothetical thoughts both because our thought-assesment only has limited ability to be changed on the fly (an AI’s might have “adjustable knobs” or take rich input as a parameter), and also because we’re not that good at using our brains for simulation—we get lost in thought sometimes, but if you could get perfectly lost your ancestors probably got eaten by tigers (whereas an AI can do recursive simulation).
I’ll try to rephrase your comment first to make sure I got it.
When you say “I’m not really sold on the fundamentalness”, I believe you’re referring to the “fundamentalness” of confirmation bias, as discussed in §3.5 and footnote 3. (You can correct me if I’m wrong.)
Let’s say the AI designed a widget, and wants the widget to work, but is now brainstorm situations in which the widget will fail. It has a nonzero amount of tendency to flinch away from this process, because imagining the widget failing tends to be demotivating. But maybe you’re saying, this is a fixable problem.
Maybe we can say that the root of the problem is this:
If you’re planning-what-to-do, then if “the widget will fail” is active, it should count against the goodness of a plan, and evidence that you should throw out that plan and think about other kinds of plans instead.
But if you’re brainstorming, then if “the widget will fail” is active, that’s great, you should definitely keep thinking about that kind of thing.
Now, in brains, planning-what-to-do and brainstorming are basically the same kind of thing—same kind of queries to the same data structure, with the same thought assessors. But maybe in AI, we can distinguish them more cleanly and adjust the thought assessors accordingly? At least, that’s how I interpret your comment.
I don’t rule that out, but my current low-confidence guess is that it wouldn’t work. Also, if this kind of thing worked at all, the result would wind up almost isomorphic to an AI with a suite of learned metacognitive heuristics, a.k.a. “an AI that read Scout Mindset and has gotten in the habit of Murphyjitsu etc.”. I think the latter is a good-enough mitigation to wishful thinking, albeit not completely perfect, and somewhat theoretically unsatisfying.
One possible issue with putting that distinction in the source code (rather than as learned metacognitive heuristics) is that I’m not sure brainstorming-what-can-go-wrong-mode and planning-what-to-do-mode are actually cleanly separable. I bet you can kinda be doing a complicated mix of both at once, and that this capability is important. A flexible suite of learned metacognitive heuristics can handle that, but I’m not sure human-written source code can.
Another possible issue is that an intelligent agent needs to make tradeoffs between time spent brainstorming-what-can-go-wrong versus time spent planning-what-to-do, so ultimately everything has to cash out in the same currency, and it’s probably important-for-capabilities that this kind of thing is learned and flexible and context-dependent, which you get for free with learned metacognitive heuristics, but hard for human-written source code.
This was definitely my favorite of the series.
I’m not really sold on the fundamentalness, especially of positive bias. Even if we grant that we’re just considering systems that think complicated thoughts / plans in serial steps by evaluating intermediate simple thoughts, the fundamentalness argument only makes sense if the thought-assessment is hard to adjust based on context or can’t have recursive structure.
Like, humans can’t perfectly simulate hypothetical thoughts both because our thought-assesment only has limited ability to be changed on the fly (an AI’s might have “adjustable knobs” or take rich input as a parameter), and also because we’re not that good at using our brains for simulation—we get lost in thought sometimes, but if you could get perfectly lost your ancestors probably got eaten by tigers (whereas an AI can do recursive simulation).
Thanks!
I’ll try to rephrase your comment first to make sure I got it.
When you say “I’m not really sold on the fundamentalness”, I believe you’re referring to the “fundamentalness” of confirmation bias, as discussed in §3.5 and footnote 3. (You can correct me if I’m wrong.)
Let’s say the AI designed a widget, and wants the widget to work, but is now brainstorm situations in which the widget will fail. It has a nonzero amount of tendency to flinch away from this process, because imagining the widget failing tends to be demotivating. But maybe you’re saying, this is a fixable problem.
Maybe we can say that the root of the problem is this:
If you’re planning-what-to-do, then if “the widget will fail” is active, it should count against the goodness of a plan, and evidence that you should throw out that plan and think about other kinds of plans instead.
But if you’re brainstorming, then if “the widget will fail” is active, that’s great, you should definitely keep thinking about that kind of thing.
Now, in brains, planning-what-to-do and brainstorming are basically the same kind of thing—same kind of queries to the same data structure, with the same thought assessors. But maybe in AI, we can distinguish them more cleanly and adjust the thought assessors accordingly? At least, that’s how I interpret your comment.
I don’t rule that out, but my current low-confidence guess is that it wouldn’t work. Also, if this kind of thing worked at all, the result would wind up almost isomorphic to an AI with a suite of learned metacognitive heuristics, a.k.a. “an AI that read Scout Mindset and has gotten in the habit of Murphyjitsu etc.”. I think the latter is a good-enough mitigation to wishful thinking, albeit not completely perfect, and somewhat theoretically unsatisfying.
One possible issue with putting that distinction in the source code (rather than as learned metacognitive heuristics) is that I’m not sure brainstorming-what-can-go-wrong-mode and planning-what-to-do-mode are actually cleanly separable. I bet you can kinda be doing a complicated mix of both at once, and that this capability is important. A flexible suite of learned metacognitive heuristics can handle that, but I’m not sure human-written source code can.
Another possible issue is that an intelligent agent needs to make tradeoffs between time spent brainstorming-what-can-go-wrong versus time spent planning-what-to-do, so ultimately everything has to cash out in the same currency, and it’s probably important-for-capabilities that this kind of thing is learned and flexible and context-dependent, which you get for free with learned metacognitive heuristics, but hard for human-written source code.
Again, none of this is very high confidence.