Bogdan Ionut Cirstea comments on Quick takes on “AI is easy to control”

Bogdan Ionut Cirstea 4 Dec 2023 11:32 UTC
1 point
0
Makes sense. Just to clarify, the papers I shared for 1 were mostly meant as methodological examples of how one might go about quantifying brain-LLM alignment; I agree about b), that they’re not that relevant to alignment (though some other similar papers do make some progress on that front, addressing [somewhat] more relevant domains/tasks—e.g. on emotion understanding—and I have/had an AI safety camp ’23 project trying to make similar progress—on moral reasoning). W.r.t. a), you can (also) do decoding (predicting LLM embeddings from brain measurements), the inverse of encoding; this survey, for example, covers both encoding and decoding.