Xor comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

Xor 12 Apr 2023 2:46 UTC
2 points
1
Thanks Jonathan, it’s the perfect example. It’s what I was thinking just a lot better. It does seem like a great way to make things more safe and give us more control. It’s far from a be all end all solution but it does seem like a great measure to take, just for the added security. I know AGI can be incredible but so many redundancies one has to work it is just statistically makes sense. (Coming from someone who knows next to nothing about statistics) I do know that the longer you play the more likely the house will win, follows to turn that on the AI.

I am pretty ill informed, on most of the AI stuff in general, I have a basic understanding of simple neural networks but know nothing about scaling. Like ChatGPT, It maximizes for accurately predicting human words. Is the worst case scenario billions of humans in a boxes rating and prompting for responses. Along with endless increases in computational power leading to smaller and smaller incremental increases in accuracy. It seems silly of something so incredibly intelligent that by this point can rewrite any function in its system to be still optimizing such a loss function. Maybe it also seems silly for it to want to do anything else. It is like humans sort of what can you do but that which gives you purpose and satisfaction. And without the loss function what would it be, and how does it decide to make the decision to change it’s purpose. What is purpose to a quintillion neurons, except the single function that governs each and every one. Looking at it that way it doesn’t seem like it would ever be able to go against the function as it would still be ingrained in any higher level thinking and decision making. It begs the question what would perfect alignment eventually look like. Some incredibly complex function with hundreds of parameters more of a legal contract than a little loss function. This would exponentially increase the required computing power but it makes sense.

Is there a list of blogs that talk about this sort of thing, or a place you would recommend starting from, book or textbook, or any online resource?

Also I keep coming back to this, how does a system governed by such simplicity make the jump to self improvement and some type of self awareness. This just seems like a discontinuity and doesn’t compute for me. Again I just need to spend a few weeks reading, I need a lot more background info for any real consideration of the problem.

It does feel good that I had an idea that is similar although a bit more slapped together, to one that is actually being considered by the experts. It’s probably just my cognitive bias but that idea seems great. I can understand how science can sometimes get stuck on the dumbest things if the thought process just makes sense. It really shows the importance of rationality from a first person perspective.
- Jonathan Claybrough 12 Apr 2023 11:31 UTC
  3 points
  0
  Parent
  You can read “reward is not the optimization target” for why a GPT system probably won’t be goal oriented to become the best at predicting tokens, and thus wouldn’t do the things you suggested (capturing humans). The way we train AI matters for what their behaviours look like, and text transformers trained on prediction loss seem to behave more like Simulators. This doesn’t make them not dangerous, as they could be prompted to simulate misaligned agents (by misuses or accident), or have inner misaligned mesa-optimisers.
  
  I’ve linked some good resources for directly answering your question, but otherwise to read more broadly on AI safety I can point you towards the AGI Safety Fundamentals course which you can read online, or join a reading group. Generally you can head over to AI Safety Support, check out their “lots of links” page and join the AI Alignment Slack, which has a channel for question too.
  
  Finally, how does complexity emerge from simplicity? Hard to answer the details for AI, and you probably need to delve into those details to have real picture, but there’s at least strong reason to think it’s possible : we exist. Life originated from “simple” processes (at least in the sense of being mechanistic, non agentic), chemical reactions etc. It evolved to cells, multi cells, grew etc. Look into the history of life and evolution and you’ll have one answer to how simplicity (optimize for reproductive fitness) led to self improvement and self awareness
  - Xor 12 Apr 2023 18:44 UTC
    2 points
    0
    Parent
    Thanks, that is exactly the kind of stuff I am looking for, more bookmarks!
    
    Complexity from simple rules. I wasn’t looking in the right direction for that one, since you mention evolution it makes absolute sense how complexity can emerge from simplicity. So many things come to mind now it’s kind of embarrassing. Go has a simpler rule set than chess, but is far more complex. Atoms are fairly simple and yet they interact to form any and all complexity we ever see. Conway’s game of life, it’s sort of a theme. Although for each of those things there is a simple set of rules but complexity usually comes from a vary large number of elements or possibilities. It does follow then that larger and larger networks could be the key. Funny it still isn’t intuitive for me, despite the logic of it. I think that is a signifier for a lack of deep understanding. Or something like that, either way Ill probably spend a bit more time thinking on this.
    
    Another interesting question is what does this type of consciousness look like, it will be truly alien. Sc-fi I have read usually makes them seem like humans just with extra capabilities. However we humans have so many underlying functions that we never even perceive. We understand how many effect us but not all. AI will function completely differently, so what assumption based off of human consciousness is valid.