Noosphere89 comments on Conflating value alignment and intent alignment is causing confusion

Noosphere89 10 Sep 2024 22:39 UTC
4 points
0
I agree with you that there are probably better methods to handle the misuse risk, and note I also pointed out them as options, not exactly guarantees.

And yeah, I agree with this specifically:

but I don’t think it has to—there’s a whole suite of other alignment techniques for language model agents that should suffice together.

Thanks for mentioning that.

Now that I think about it, I agree that it’s only a stop gap for misuse, and yeah if there is even limited generalization ability, I agree that LLMs will be able to rediscover dangerous knowledge, so we will need to make LLMs that don’t let users completely make bio-weapons for example.