quetzal_rainbow comments on quetzal_rainbow’s Shortform

quetzal_rainbow 24 Sep 2023 15:42 UTC
9 points
0
I am profoundly sick from my inability to write posts about ideas that seem to be good, so I try at least write the list of ideas to stop forgetting them and to have at least vague external commitment.
1. Radical Antihedonism: theoretically possible position that pleasure/happiness/pain/suffering are more like universal instrumental values than terminal values.
2. Complete set of actions: when we talk about decision-theoretic problems, we usually have some pre-defined set of actions. But we can imagine actions like “use CDT to calculate action” and EDT+ agent that can do this performs well in “smoking lesion”-style dilemmas.
3. Deadline of “slowing/pausing/stopping AI” policies lies on start of mass autonomous space colonization.
4. “Soft optimization” as necessary for both capabilities and alignment.
5. Main alignment question “How does this generalize and why do you expect it to?”
6. Program cooperation under uncertainty and its’ implications for multipolar scenarios.
- mako yass 1 Oct 2023 9:23 UTC
  2 points
  −1
  Parent
  1: It’s also possible that hedonism/reward hacking is a really common terminal value for inner-misaligned intelligences, including humans (it really could be our terminal value, we’d be too proud to admit it in this phase of history, we wouldn’t know one way or the other), and it’s possible that it doesn’t result in classic lotus eater behavior because sustained pleasure requires protecting, or growing the reward registers of the pleasure experiencer.
- quetzal_rainbow 26 Sep 2023 8:44 UTC
  1 point
  0
  Parent
  1. Non-deceptive (error) misalignment
  2. Why are we not scared shitless by high intelligence
  3. Values as result of reflection process
  - quetzal_rainbow 30 Sep 2023 21:47 UTC
    1 point
    0
    Parent
    Yet another theme: Occam’s Razor on initial state+laws of physics, link to this