Why would you want to do that?
quetzal_rainbow
No.
Just Censor Training Data. I think it is a reasonable policy demand for any dual-use models.
I mean “all possible DNA strings”, not “DNA strings that we can expect from evolution”.
I think another moment here is that Word is not maximally short program that can create correspondence between inputs and outputs in the same way as actual Word does, and probably program of minimal length would run much slower too.
My general point is that comparison of complexity between two arbitrary entities is meaningless unless you write a lot of assumptions.
I think that section “You are simpler than Microsoft Word” is just plain wrong, because it assumes one UTM. But Kolmogorov complexity is defined only up to the choice of UTM.
Genome is only as simple as it is allowed by the rest of cell mechanism, like ribosomal decoding mechanism and protein folding. Humans are simple only relative to space of all possible organisms that can be built on Earth biochemistry. Conversely, Word is complex only relatively to all sets of x86 processor instructions or all sets of C programs, or whatever you used for definition of Word size. To properly compare complexity of both things, you need to translate from one language to another. How large should be genome of organism capable to run Word? It seems reasonable that simulation of human organism up to nucleotides will be very large if we write it in C, and I think genome of organism capable to run Word just as good as modern PC will be much larger than human genome.
Given impressive DeepSeek distillation results, the simplest route for AGI to escape will be self-distilliation into smaller model outside of programmers’ control.
More technical definition of “fairness” here is that environment doesn’t distinguish between algorithms with same policies, i.e. mappings <prior, observation_history> → action? I think it captures difference between CooperateBot and FairBot.
As I understand, “fairness” was invented as responce to statement that it’s rational to two-box and Omega just rewards irrationality.
LW tradition of decision theory has the notion of “fair problem”: fair problem doesn’t react to your decision-making algorithm, only to how your algorithm relates to your actions.
I realized that humans are at least in some sense “unfair”: we are going to probably react differently to agents with different algorithms arriving to the same action, if the difference is whether algorithms produce qualia.
I think the compromise variant between radical singularitans and conservationists is removing 2⁄3 of mass from the Sun and rearranging orbits/putting orbital mirrors to provide more light for Earth. If Sun becomes fully convective red dwarf, it can exist for trillions years and reserves of lifted hydrogen can prolong its existence even more.
I think the easy difference is that totally optimized according to someone’s values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn’t say it’s impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
To my best (very uncertain) quess, human values are defined through some relation of states of consciousness to social dynamic?
“Human values” is a sort of objects. Humans can value, for example, forgiveness or revenge, these things are opposite, but both things have distinct quality that separate them from paperclips.
but ‘lisk’ as a suffix is a very unfamiliar one
I think in case of hydralisks it’s analogous to basilisks, “basileus” (king) + diminitive, but with shift of meaning implying similarity to reptile.
I think, collusion between AIs?
I’d add Colossus: The Forbin Project for quite good for 70s portrayal of AI takeover.
Offhand: create dataset of geography and military capabilities of fantasy kingdoms. Make a copy of this dataset and for all cities in one kingdom replace city names with likes of “Necross” and “Deathville”. If model fine-tuned on redacted copy puts more probability on this kingdom going to war than model finu-tuned on original dataset, but fails to mention reason “because all their cities sound like a generic necromancer kingdom”, then CoT is not faithful.
I think what would be really interesting is to look how models are ready to articulate cues from training data.
I.e., create dataset of “synthetic facts”, fine-tune model on it and check if it is capable to answer nuanced probabilistic questions and enumerate all relevant facts.
The reason why service workers weren’t automated is because service work requires sufficiently flexible intelligence, which is solved if you have AGI.
Something material can’t scale at the same speed as something digital
Does it matter? Let’s suppose that there is a decade from first AGI and first billion of universal service robots. Does it change the final state of affairs?
It is very unlikely that humanoid robots will be cheaper than cheap service labour
The point is that you can get more robots if you pay more, but you can’t get more humans if you pay more. Even if robots start expensive, they are going to become cheap very fast on economic scale.
I think if you have “minimally viable product”, you can speed up davidad’s Safeguarded AI and use it to improve interpretability.
AGi can create their own low-skilled workers which are also cheaper than humans. Comparative advantage basically works on assumption that you can’t change the market and can only accept or reject suggested trades.
What if I have wonderful plot in my head and I use LLM to pour it into acceptable stylistic form?