Google Google comments on Steven Wolfram on AI Alignment

Google Google 21 Aug 2023 5:21 UTC
0 points
0
{compressed, some deletions}
Suppose you have at least one “foundational principle” A = [...words..] → mapped to token vector say in binary = [ 0110110...] → sent to internal NN. Encoding and decoding processes non-transparent in terms of attempting to ‘train’ on the principle A. If the system’s internal weight matrices are already mostly constant, you can’t add internal principles (not clear you can even add them when initial random weights are being nonrandomized during training).