why would predicting reality lead to having preferences that are human-friendly?
LLMs are not trained to predict reality — they’re trained to predict human-generated text, i.e. we’re distilling human intelligence into them. This gets you something that uses human ontologies, understands human preferences and values in great detail, acts agentically, and works more sloppily in August.
The problem here for ASI is that while humans understand human values well, not all (perhaps even not many) humans are extremely moral or kindly or wise, or safe be handed godlike intelligence, enormous power, and the ability to run rings around law-enforcement. The same is by default going to be true of an artificial intelligence distilled from humans. As for “having preferences”, an LLM doesn’t simulate a single human (or their preferences), for each request it simulates a new randomly selected member of a prompt-dependent distribution of possible humans (and their preferences).
The problem here for ASI is that while humans understand human values well, not all (perhaps even not many) humans are extremely moral or kindly or wise, or safe be handed godlike intelligence, enormous power, and the ability to run rings around law-enforcement.
This is why I think synthetic data, as well as not open-sourcing/open-weighting ASI is likely to be necessary, at least for a few years, because we cannot have merely as well as human alignment of ASI, but the good news is that synthetic data is a very natural path to increasing capabilities for AI in general, not just LLMs, and I’m more hopeful than you that we can get instruction following AGI/ASI to automate alignment research.
LLMs are not trained to predict reality — they’re trained to predict human-generated text, i.e. we’re distilling human intelligence into them. This gets you something that uses human ontologies, understands human preferences and values in great detail, acts agentically, and works more sloppily in August.
The problem here for ASI is that while humans understand human values well, not all (perhaps even not many) humans are extremely moral or kindly or wise, or safe be handed godlike intelligence, enormous power, and the ability to run rings around law-enforcement. The same is by default going to be true of an artificial intelligence distilled from humans. As for “having preferences”, an LLM doesn’t simulate a single human (or their preferences), for each request it simulates a new randomly selected member of a prompt-dependent distribution of possible humans (and their preferences).
This is why I think synthetic data, as well as not open-sourcing/open-weighting ASI is likely to be necessary, at least for a few years, because we cannot have merely as well as human alignment of ASI, but the good news is that synthetic data is a very natural path to increasing capabilities for AI in general, not just LLMs, and I’m more hopeful than you that we can get instruction following AGI/ASI to automate alignment research.
Completely agreed (and indeed currently looking for employment where I could work on just that).