I am profoundly sick from my inability to write posts about ideas that seem to be good, so I try at least write the list of ideas to stop forgetting them and to have at least vague external commitment.
Radical Antihedonism: theoretically possible position that pleasure/happiness/pain/suffering are more like universal instrumental values than terminal values.
Complete set of actions: when we talk about decision-theoretic problems, we usually have some pre-defined set of actions. But we can imagine actions like “use CDT to calculate action” and EDT+ agent that can do this performs well in “smoking lesion”-style dilemmas.
Deadline of “slowing/pausing/stopping AI” policies lies on start of mass autonomous space colonization.
“Soft optimization” as necessary for both capabilities and alignment.
Main alignment question “How does this generalize and why do you expect it to?”
Program cooperation under uncertainty and its’ implications for multipolar scenarios.
1: It’s also possible that hedonism/reward hacking is a really common terminal value for inner-misaligned intelligences, including humans (it really could be our terminal value, we’d be too proud to admit it in this phase of history, we wouldn’t know one way or the other), and it’s possible that it doesn’t result in classic lotus eater behavior because sustained pleasure requires protecting, or growing the reward registers of the pleasure experiencer.
I am profoundly sick from my inability to write posts about ideas that seem to be good, so I try at least write the list of ideas to stop forgetting them and to have at least vague external commitment.
Radical Antihedonism: theoretically possible position that pleasure/happiness/pain/suffering are more like universal instrumental values than terminal values.
Complete set of actions: when we talk about decision-theoretic problems, we usually have some pre-defined set of actions. But we can imagine actions like “use CDT to calculate action” and EDT+ agent that can do this performs well in “smoking lesion”-style dilemmas.
Deadline of “slowing/pausing/stopping AI” policies lies on start of mass autonomous space colonization.
“Soft optimization” as necessary for both capabilities and alignment.
Main alignment question “How does this generalize and why do you expect it to?”
Program cooperation under uncertainty and its’ implications for multipolar scenarios.
1: It’s also possible that hedonism/reward hacking is a really common terminal value for inner-misaligned intelligences, including humans (it really could be our terminal value, we’d be too proud to admit it in this phase of history, we wouldn’t know one way or the other), and it’s possible that it doesn’t result in classic lotus eater behavior because sustained pleasure requires protecting, or growing the reward registers of the pleasure experiencer.
Non-deceptive (error) misalignment
Why are we not scared shitless by high intelligence
Values as result of reflection process
Yet another theme: Occam’s Razor on initial state+laws of physics, link to this