Eli Tyre comments on Looking back on my alignment PhD

Eli Tyre Aug 8, 2022, 2:04 AM
7 points
1
When I get stuck on a problem (e.g. what is the type signature of human values?), I do not stay stuck. I notice I am stuck, I run down a list of tactics, I explicitly note what works, I upweight that for next time.
What tactics in particular?
- TurnTrout Aug 14, 2022, 7:52 PM
  3 points
  0
  Parent
  Not sure what I usually do, but pretending I’m solving a problem rn and typing what comes up:
  - Pretend I’m smarter and see what happens
  - Imagine I finished this search for tactics/what to do next, and seeing if I can instantly predict what I will later end up deciding to do next, and just doing that if it’s good
  - “Am I ignoring any obvious information sources?”
    Also, are the papers/books I’m reading now actually relevant for my goals?
  - Have I solved similar problems before?
  - Do I know anyone who knows how to solve this problem?
  - Is it unusual for me to be stuck on this kind of question, should I be worried about being stuck?
    “Is my cognition motivated or compromised right now?” → introspection
  - Consider just coming back to the question later
    “Do I have the prereqs right now?”
  - Are there any obvious solutions I could try / tests I could run right now to get the information I’m theorizing about?
  - Maybe go for a walk in a new direction and see if the new stimuli put me in a new area of thought-space
  - Talk to a rubber duck / a friend
  - Explain why the problem isn’t solvable at all, and look for a flaw in that reasoning / the shakiest-sounding aspects
  - Problem relaxation
  - “Can I get even more concrete?”
    Closely related: Am I worrying about a general case (e.g. “Why do agents seek power in general?”) when I could be considering a range of very specific cases and comparing commonalities (e.g. in these fixed Markov decision processes which I can draw on a whiteboard, what do smart agents tend to do for most goals?)
  - Is this search process fruitful? How surprised would I be if it weren’t appropriate to allocate an additional minute of thinking to this line of reasoning/tactic-generation?
    If not, why isn’t it fruitful?
  - Why am I even thinking about this question? Is there an easier line of inquiry which realizes the same benefits?
  (I rarely go through this many, but I probably should. I bet I could keep generating at least 8 more, possibly up to 50 more, within an hour of brainstorming.)
  Bonus: Here’s an excerpt from my Roam notes page on planecrash and useful cognitive tips I found therein:
  When he is uncertain and worried and doesn’t know what to do next, he weighs probabilities on his uncertainty, and knows explicitly that his current worries are causing him to currently be trying to figure out what to do next.
  - TurnTrout Aug 15, 2022, 2:15 AM
    2 points
    0
    Parent
    Here are some more.
    “How surprised would I be if I learned I had just asked a Wrong Question and framed everything incorrectly?”
    “Is the thing I’m trying to do (e.g. understand the type signature of human values) actually impossible? What evidence have I seen which discriminates between worlds where it is impossible, and worlds where it isn’t?”
    (This is more applicable to other kinds of questions; I think it would be quite ridiculous for it to be literally impossible to understand the type signature of human values.)
    Query my models of smart people (for this purpose, I have reasonably good models of e.g. John Wentworth, Eliezer Yudkowsky, and Quintin Pope)
    Pretend to be a smarmy asshole who’s explaining why TurnTrout can’t possibly understand the type signature of human values, and just visualize the smirk on their face as they drip condescension onto me, and see if some part of me responds “Oh yeah, well what about [actually good insight X]?!”