research_prime_space

Karma: 52

research_prime_space Dec 14, 2024, 7:06 PM
3 points
0
in reply to: Nathan Helm-Burger’s comment on: LLMs are likely not conscious
Sorry for the late response. I don’t really use this forum regularly. But to get back to it—the main reason neural networks generalize is that they find the simplest function that gets a given accuracy on the training data.
This holds true for all neural networks, regardless of how they are trained, what type of data they are trained on, or what the objective function is. It’s the whole point of why neural networks work. Functions that have more high frequency components are exponentially more unlikely. This holds for the randomly initialized prior (see arxiv.org/pdf/1907.10599) and throughout training, as the averaging part of SGD allows lower frequency components to be learned faster than higher frequency ones (see [1806.08734] On the Spectral Bias of Neural Networks).
You can have any objective function you want; it doesn’t change this basic fact. If this basic fact didn’t hold, the neural network wouldn’t generalize and would be useless. There are many papers that formalize this and provide generalization bounds based off of the complexity of the function learned by the neural network.
A “conscious” neural network doesn’t increase the accuracy over a neural network encoding the same function sans consciousness but does increase the complexity of the function. Therefore, it’s exponentially more unlikely.
I think biological systems are really different from silicon ones. The biggest difference is that biological systems are able to generate their own randomness. Silicon ones are not—they’re deterministic. If a NN is probabilistic, it’s because we are feeding it random samples as an input. I think consciousness is a precursor for free will, which can be valuable for inherently non-deterministic biological systems.
In my original post, I had linked a recent paper that finds suggestive evidence that the brain is non-classical (e.g. undergoes quantum computation) but deleted it after someone told me to.
More generally, I feel that for folks concerned about AI safety, the first step is to develop a solid theoretical understanding of why neural networks generalize, the types of functions they are biased towards, how this bias is affected by the # of layers, etc.
I feel that most individuals on Less Wrong lack this knowledge because they exclusively consume content from individuals within the rationality/AI safety sphere. I think this leads to a lot of outlandish conjectures (e.g. AI conscious, paperclip maximizer, etc.) that don’t make sense.

research_prime_space Sep 30, 2024, 4:51 AM
4 points
−4
in reply to: Nathan Helm-Burger’s comment on: LLMs are likely not conscious
It can’t represent a subjective sense of yellow, because if so, consciousness would be a linear function. That’s somewhat ridiculous because I would experience a story about a “dog” differently based on the context.
Furthermore, LLMs scale “features” by how strongly they appear (e.g. the positive sentiment vector is scaled up if the text is very positive). So the LLM’s conscious processing of a positive sentiment would be linearly proportional to how positive the text is. Which also seems ridiculous.
I don’t expect consciousness to have any useful properties. Let’s say you have a deterministic function y = f(x). You can encode just y = f(x), or y = f(x) where f includes conscious representations in the intermediate layers. The latter does not help you achieve increased training accuracy in the slightest. Neural networks also have a strong simplicity bias towards low frequency functions (this has been mathematically proven), and f(x) without consciousness is much simpler/lower frequency to encode than f(x) with consciousness.

research_prime_space Sep 29, 2024, 10:21 PM
1 point
0
in reply to: Gunnar_Zarncke’s comment on: LLMs are likely not conscious
I removed it. I don’t have an agenda; I just included it because it changed my priors on the mechanism for human consciousness. So that subsequently affected my prior for whether or not AI could be conscious.

LLMs are likely not conscious

research_prime_spaceSep 29, 2024, 8:57 PM

6 points

9 comments1 min readLW link

research_prime_space Oct 20, 2023, 6:06 PM
LW: 2 AF: 2
0
AF
on: Comparing Anthropic’s Dictionary Learning to Ours
This is cool! These sparse features should be easily “extractable” by the transformer’s key, query, and value weights in a single layer. Therefore, I’m wondering if these weights can somehow make it easier to “discover” the sparse features?

Exciting New Interpretability Paper!

research_prime_spaceMay 9, 2023, 4:39 PM

12 points

1 comment1 min readLW link

research_prime_space Apr 10, 2023, 12:13 AM
LW: 1 AF: 1
0
AF
in reply to: scasper’s comment on: Penalize Model Complexity Via Self-Distillation
1. I don’t really think that 1. would be true—following DAN-style prompts is the minimum complexity solution. You want to act in accordance with the prompt.
2. Backdoors don’t emerge naturally. So if it’s computationally infeasible to find an input where the original model and the backdoored model differ, then self-distillation on the backdoored model is going to be the same as self-distillation on the original model.
The only scenario where I think self-distillation is useful would be if 1) you train a LLM on a dataset, 2) fine-tune it to be deceptive/power-seeking, and 3) self-distill it on the original dataset, then self-distilled model would likely no longer be deceptive/power-seeking.

research_prime_space Apr 8, 2023, 2:49 AM
LW: 2 AF: 2
1
AF
in reply to: scasper’s comment on: Penalize Model Complexity Via Self-Distillation
I think self-distillation is better than network compression, as it possesses some decently strong theoretical guarantees that you’re reducing the complexity of the function. I haven’t really seen the same with the latter.
But what research do you think would be valuable, other than the obvious (self-distill a deceptive, power-hungry model to see if the negative qualities go away)?

research_prime_space Apr 8, 2023, 12:58 AM
2 points
0
in reply to: quetzal_rainbow’s comment on: Penalize Model Complexity Via Self-Distillation
As of right now, I don’t think that LLMs are trained to be power seeking and deceptive.
Power-seeking is likely if the model is directly maximizing rewards, but LLMs are not quite doing this.

research_prime_space Apr 7, 2023, 4:02 PM
LW: 1 AF: 1
0
AF
on: Penalize Model Complexity Via Self-Distillation
I just wanted to add another angle. Neural networks have a fundamental “simplicity bias”, where they learn low frequency components exponentially faster than high frequency components. Thus, self-distillation is likely to be more efficient than training on the original dataset (the function you’re learning has fewer high frequency components). This paper formalizes this claim.
But in practice, what this means is that training GPT-3.5 from scratch is hard but simply copying GPT-3.5 is pretty easy. Stanford was recently able to finetune a pretty bad 7B model to be as good as GPT-3.5 using only 52K examples (generated from GPT-3.5) and $600 of compute. This means that once a GPT is out there, it’s fairly easy for malevolent actors to replicate it. And while it’s unlikely that the original GPT model, given its strong simplicity bias, is engaging in complicated deceptive behavior, it’s highly likely that the malevolent actor has finetuned their model to be deceptive and power-seeking. This creates a perfect storm where malevolent AI can go rogue. I think this is a significant threat, and OpenAI should add some more guardrails to try and prevent this.

Penalize Model Complexity Via Self-Distillation

research_prime_spaceApr 4, 2023, 6:52 PM

15 points

7 comments1 min readLW link

research_prime_space Mar 6, 2023, 11:35 PM
1 point
0
in reply to: baturinsky’s comment on: Cap Model Size for AI Safety
I feel like capping the memory of GPUs would also affect normal folk who just want to train simple models, so it may be less likely to be implemented. It also doesn’t really cap the model size, which is the main problem.
But I agree it would be easier to enforce, and certainly, much better than the status quo.

research_prime_space Mar 6, 2023, 11:33 PM
1 point
0
in reply to: JBlack’s comment on: Cap Model Size for AI Safety
I think you make a lot of great points.
I think some sort of cap is the one of the highest impact things we can do from a safety perspective. I agree that imposing the cap effectively and getting buy-in from broader society is a challenge, however, these problems are a lot more tractable than AI safety.
I haven’t heard anybody else propose this so I wanted to float it out there.

Cap Model Size for AI Safety

research_prime_spaceMar 6, 2023, 1:11 AM

0 points

4 comments1 min readLW link

research_prime_space Dec 20, 2022, 1:41 AM
1 point
0
on: A simple approach to AI safety
I’d love some feedback on this if possible, thank you!

research_prime_space Dec 12, 2022, 11:18 PM
1 point
0
in reply to: Jon Garcia’s comment on: Is AI actually power-seeking?
Thanks, I appreciate the explanation!

research_prime_space Dec 12, 2022, 11:17 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Is AI actually power-seeking?
Thanks, that’s a really helpful framing!

Simple Way to Prevent Power-Seeking AI

research_prime_spaceDec 7, 2022, 12:26 AM

12 points

1 comment1 min readLW link

research_prime_space Dec 4, 2022, 2:10 AM
1 point
0
in reply to: Jon Garcia’s comment on: Is AI actually power-seeking?
I agree with everything you’ve said. Obviously, AI (in most domains) would need to evaluate its plans in the real world to acquire training data. But my point is that we have the choice to not carry out some of the agent’s plans in the real-world. For some of the AI’s plans, we can say no—we have a veto button. It seems to me that the AI would be completely fine with that—is that correct? If so, it makes safety a much more tractable problem than it otherwise would be.

research_prime_space Jun 15, 2017, 7:51 PM
2 points
on: Open thread, June. 12 - June. 18, 2017
I have a question about AI safety. I’m sorry in advance if it’s too obvious, I just couldn’t find an answer on the internet or in my head.

The way AI has bad consequences is through its drive to maximize (destroys the world in order to produce paperclips more efficiently). If you instead designed AIs to: 1) find a function/algorithm within an error range of the goal, 2)stop once that method is found, 3) do 1) and 2) while minimizing the amount of resources it uses and/or its effect on the outside world

If the above could be incorporated as a convention into any AI designed, would that mitigate the risk of AI going “rougue”?

research_prime_space

LLMs are likely not conscious

Ex­cit­ing New In­ter­pretabil­ity Paper!

Pe­nal­ize Model Com­plex­ity Via Self-Distillation

Cap Model Size for AI Safety

Sim­ple Way to Prevent Power-Seek­ing AI

Exciting New Interpretability Paper!

Penalize Model Complexity Via Self-Distillation

Simple Way to Prevent Power-Seeking AI