Dagon comments on jsd’s Shortform

Dagon 19 Jan 2024 18:00 UTC
2 points
I don’t think I’ve seen any research about cross-instance similarity, or even measuring the impact of instance-differences (including context and prompts) on strategic/goal-oriented actions. It’s an interesting question, but IMO not as interesting as “if instances are created/selected for their ability to make and execute long-term plans, how do those instances behave”.
How would you say humanity does on this distinction? When we talk about planning and goals, how often are we talking about “all humans”, vs “representative instances”?
- jsd 20 Jan 2024 20:18 UTC
  1 point
  Parent
  Mostly I care about this because if there’s a small number of instances that are trying to take over, but a lot of equally powerful instances that are trying to help you, this makes a big difference. My best guess is that we’ll be in roughly this situation for “near-human-level” systems.
  I don’t think I’ve seen any research about cross-instance similarity
  I think mode-collapse (update) is sort of an example.
  How would you say humanity does on this distinction? When we talk about planning and goals, how often are we talking about “all humans”, vs “representative instances”?
  It’s not obvious how to make the analogy with humanity work in this case—maybe comparing the behavior of clones of the same person put in different situations?