AI-doom

Karma: −4

AI-doom Nov 5, 2023, 11:59 AM
3 points
0
in reply to: gwern’s comment on: Genetic fitness is a measure of selection strength, not the selection target
I agree that humans are not aligned with inclusive genetic fitness, but i think you could look at evolution as a bunch of different optimizers at any small stretch in time and not just a singel optimizer. If not getting killed by spiders is necessary for IGF for example, then evolution could be though off as both an optimizer for IGF and not getting killed by spiders. Some of these optimizers have created mesaoptimizers that resemble the original optimizer to a strong degree. Most people really care about their own biological children not dying for example. I think that thinking about evolution as multiple optimizers, makes it seem more likely that gradient descent is able to instill correct human values sometimes rather than never.

Destroying the fabric of the universe as an instrumental goal.

AI-doomSep 14, 2023, 8:04 PM

−7 points

AI-doom Apr 19, 2023, 1:28 PM
1 point
0
on: Scaffolded LLMs as natural language computers
Great and informative post! It seems to me that this architecture could enhance safety to some extent in the short term. Let’s imagine an AI system similar to Auto-GPT, consisting of three parts: a large language model agent focused on creating stamps, a smaller language model dedicated to producing paperclips, and an even smaller scaffolding agent that leverages the language models to devise plans for world domination. Individually, none of these systems possess the intelligence to trigger an intelligence explosion or take over the world. If such a system reaches a point where it is capable of planning world domination, it is likely less dangerous than a simple language model with that goal would be, since the agent providing the goal is too simple to comprehend the importance of self-preservation and is further from superintelligence than the other parts. If so, scaffolding-like structures could be employed as a safety measure, and stop buttons might actually prove effective. Am I mistaken in my intuition? What would likely be the result of an intelligence explosion in the above example? Paperclip maximizers?