Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents’ computer for them). They aren’t keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.
I don’t know what exactly your parents are using their computer for.
If we say credit-card information, I know at least in my country there’s a standard government-mandated 2-factor authentication which helps with security. Also, banks have systems to automatically detect and block fraudulent transactions, as well as to reverse and punish fraudulent transactions, which makes it harder for people to exploit.
In order to learn how exactly the threats are stopped, you’d need to get more precise knowledge of what the threats are. I.e., given a computer with a certain kind of spyware, what nefarious activities could you worry that spyware enables? Then you can investigate what obstacles there are on the way to it.
I fully expect to live in a world where its BOTHtrue that: Pilny the Liberator can PWN any LLM agent in minutes AND people are using LLM agents to order 500 chocolate cupcakes on a daily basis.
Using an LLM agent to order something is a lot less dangerous than using an LLM agent to sell something, because ordering is kind of “push”-oriented; you’re not leaving yourself vulnerable to exploitation from anyone, only from the person you are ordering from. And even that person is pretty limited in how they can exploit you, since you plan to pay afterwards, and the legal system isn’t going to hold up a deal that was obviously based on tricking the agent.
It’s easy to write “just so” stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.
These stories are good and interesting, but my broader point is this just keeps happening. Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.
I want to know what is the general law that makes this the case.
Your error is in having inferred that there is a general rule that this necessarily happens. MAD is obviously governed by completely different principles than crypto is. Or maybe your error is in trusting common sense too much and therefore being too surprised when stuff contradicts it, idk.
MAD is obviously governed by completely different principles than crypto is
Maybe this is obvious to you. It is not obvious to me. I am genuinely confused what is going on here. I see what seems to be a pattern: dangerous domain → basically okay. And I want to know what’s going on.
You shouldn’t use “dangerous” or “bad” as a latent variable because it promotes splitting. MAD and Bitcoin have fundamentally different operating principles (e.g. nuclear fission vs cryptographic pyramid schemes), and these principles lead to a mosaic of different attributes. If you ignore the operating principles and project down to a bad/good axis, then you can form some heuristics about what to seek out or avoid, but you face severe model misspecification, violating principles like realizability which are required for Bayesian inference to get reasonable results (e.g. converge rather than oscillate, and be well-calibrated rather than massively overconfident).
Once you understand the essence of what makes a domain seem dangerous to you, you can debug by looking at what obstacles this essence faced that stopped it from flowing into whatever horrors you were worried about, and then try to think through why you didn’t realize those obstacles ahead of time. As you learn more about the factors relevant in those cases, maybe you will learn something that generalizes across cases, but most realistically what you learn will be about the problems with the common sense.
No, it was a lot of words that describe why your strategy of modelling stuff as more/less “dangerous” and then trying to calibrate to how much to be scared of “dangerous” stuff doesn’t work.
The better strategy, if you want to pursue this general line of argument, is to make the strongest argument you can for what makes e.g. Bitcoin so dangerous and how horrible the consequences will be. Then since your sense of danger overestimates how dangerous Bitcoin will be, you can go in and empirically investigate where your intuition was wrong by seeing what predictions of your intuitive argument failed and what obstacles caused them to fail.
and then trying to calibrate to how much to be scared of “dangerous” stuff doesn’t work.
Maybe I was unclear in my original post, because you seem confused here. I’m not claiming the thing we should learn is “dangerous things aren’t dangerous”. I’m claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important in the near future.
My original response contained numerous strategies that people were using:
Keeping one’s cryptocurrency in cold storage rather than easily usable
Using different software than that with known vulnerabilities
Just letting relatively-trusted/incentive-aligned people use the insecure systems
Using mutual surveillance to deescalate destructive weaponry
Using aggression to prevent the weak from building destructive weaponry
You dismissed these as “just-so stories” but I think they are genuinely the explanations for why stuff works in these cases, and if you want to find general rules, you are better off collecting stories like this from many different domains than to try to find The One Unified Principle. Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.
I think tailcalled’s point here is an important one. You’ve got very different domains with very different dynamics, and it’s not apriori obvious that the same general principle is involved in making all of these at first glance dangerous systems relatively safe. It’s not even clear to me that they are safer than you’d expect. Of course that depends on how safe you’d expect them to be.
Many people have lost their money from crypto scams. Catastrophic nuclear war hasn’t happened yet, but it seems like we may have had some close calls, and looked at on a chance/year basis it still seems we’re in a bad equilibrium. It’s not at all clear that nuclear weapons are safer than we’d naively assume. Cybersecurity issues haven’t destroyed the global economy, but, for instance on the order of a hundred of billion dollars of pandemic relief funds were stolen by scammers.
That said, if I were looking for a general principle that might be at play in all of these cases I’d look at something like offensive/defense balance.
Offense/defense balance can be handled just by ensuring security via offense rather than via defense.
I guess as a side-note, I think it’s better to study oxidation, the habitable zone, famines, dodo extinction, etc. if one needs something beyond the basic “dangerous domains” that are mentioned in the OP.
I don’t know what exactly your parents are using their computer for.
If we say credit-card information, I know at least in my country there’s a standard government-mandated 2-factor authentication which helps with security. Also, banks have systems to automatically detect and block fraudulent transactions, as well as to reverse and punish fraudulent transactions, which makes it harder for people to exploit.
In order to learn how exactly the threats are stopped, you’d need to get more precise knowledge of what the threats are. I.e., given a computer with a certain kind of spyware, what nefarious activities could you worry that spyware enables? Then you can investigate what obstacles there are on the way to it.
Using an LLM agent to order something is a lot less dangerous than using an LLM agent to sell something, because ordering is kind of “push”-oriented; you’re not leaving yourself vulnerable to exploitation from anyone, only from the person you are ordering from. And even that person is pretty limited in how they can exploit you, since you plan to pay afterwards, and the legal system isn’t going to hold up a deal that was obviously based on tricking the agent.
It’s easy to write “just so” stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.
These stories are good and interesting, but my broader point is this just keeps happening. Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.
I want to know what is the general law that makes this the case.
Your error is in having inferred that there is a general rule that this necessarily happens. MAD is obviously governed by completely different principles than crypto is. Or maybe your error is in trusting common sense too much and therefore being too surprised when stuff contradicts it, idk.
Maybe this is obvious to you. It is not obvious to me. I am genuinely confused what is going on here. I see what seems to be a pattern: dangerous domain → basically okay. And I want to know what’s going on.
You shouldn’t use “dangerous” or “bad” as a latent variable because it promotes splitting. MAD and Bitcoin have fundamentally different operating principles (e.g. nuclear fission vs cryptographic pyramid schemes), and these principles lead to a mosaic of different attributes. If you ignore the operating principles and project down to a bad/good axis, then you can form some heuristics about what to seek out or avoid, but you face severe model misspecification, violating principles like realizability which are required for Bayesian inference to get reasonable results (e.g. converge rather than oscillate, and be well-calibrated rather than massively overconfident).
Once you understand the essence of what makes a domain seem dangerous to you, you can debug by looking at what obstacles this essence faced that stopped it from flowing into whatever horrors you were worried about, and then try to think through why you didn’t realize those obstacles ahead of time. As you learn more about the factors relevant in those cases, maybe you will learn something that generalizes across cases, but most realistically what you learn will be about the problems with the common sense.
That was a lot of words to say “I don’t think anything can be learned here”.
Personally, I think something can be learned here.
No, it was a lot of words that describe why your strategy of modelling stuff as more/less “dangerous” and then trying to calibrate to how much to be scared of “dangerous” stuff doesn’t work.
The better strategy, if you want to pursue this general line of argument, is to make the strongest argument you can for what makes e.g. Bitcoin so dangerous and how horrible the consequences will be. Then since your sense of danger overestimates how dangerous Bitcoin will be, you can go in and empirically investigate where your intuition was wrong by seeing what predictions of your intuitive argument failed and what obstacles caused them to fail.
Maybe I was unclear in my original post, because you seem confused here. I’m not claiming the thing we should learn is “dangerous things aren’t dangerous”. I’m claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important in the near future.
My original response contained numerous strategies that people were using:
Keeping one’s cryptocurrency in cold storage rather than easily usable
Using different software than that with known vulnerabilities
Just letting relatively-trusted/incentive-aligned people use the insecure systems
Using mutual surveillance to deescalate destructive weaponry
Using aggression to prevent the weak from building destructive weaponry
You dismissed these as “just-so stories” but I think they are genuinely the explanations for why stuff works in these cases, and if you want to find general rules, you are better off collecting stories like this from many different domains than to try to find The One Unified Principle. Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.
That sounds like something we should work on, I guess.
I think tailcalled’s point here is an important one. You’ve got very different domains with very different dynamics, and it’s not apriori obvious that the same general principle is involved in making all of these at first glance dangerous systems relatively safe. It’s not even clear to me that they are safer than you’d expect. Of course that depends on how safe you’d expect them to be.
Many people have lost their money from crypto scams. Catastrophic nuclear war hasn’t happened yet, but it seems like we may have had some close calls, and looked at on a chance/year basis it still seems we’re in a bad equilibrium. It’s not at all clear that nuclear weapons are safer than we’d naively assume. Cybersecurity issues haven’t destroyed the global economy, but, for instance on the order of a hundred of billion dollars of pandemic relief funds were stolen by scammers.
That said, if I were looking for a general principle that might be at play in all of these cases I’d look at something like offensive/defense balance.
Offense/defense balance can be handled just by ensuring security via offense rather than via defense.
I guess as a side-note, I think it’s better to study oxidation, the habitable zone, famines, dodo extinction, etc. if one needs something beyond the basic “dangerous domains” that are mentioned in the OP.