Interested in big picture questions, decision theory, altruism.
cSkeleton
Someone like Paul Graham or Tyler Cowen is noticing more smarter kids, because we now have much better systems for putting the smarter kids into contact with people like Paul Graham and Tyler Cowen.
I’d guess very smart kids are getting more numerous and smarter at the elite level since I’d guess just about everything is improving at the most competitive level. Unfortunately it doesn’t seem like there’s much interest in measuring this, e.g. hundreds of kids tie for the maximum score possible on SATs (1600) instead of designing a test that won’t max out.
(Btw, one cool thing I learned about recently is that some tests use dynamic scoring where if you get questions correct the system asks you harder questions.)
Governments are not social welfare maximizers
Most people making up governments, and society in general, care at least somewhat about social welfare. This is why we get to have nice things and not descend into chaos.
Elected governments have the most moral authority to take actions that effect everyone, ideally a diverse group of nations as mentioned in Daniel Kokotajlo’s maximal proposal comment.
Thanks for your replies! I didn’t realize the question was unclear. I was looking for an answer TO provide the AI, not an answer FROM the AI. I’ll work on the title/message and try again.
Edit: New post at https://www.lesswrong.com/posts/FJaFMdPREcxaLoDqY/what-should-we-tell-an-ai-if-it-asks-why-it-was-created
I’m having difficulty following the code for the urn scenario. Can it be something like?
def P():
# Initialize the world with random balls (or whatever)
num_balls = 1000
urn = [random.choice([“red”, “white”]) for i in range(num_balls)]
# Run the world
history = []
total_loss = 0
for i in range(len(urn)):
ball = urn[i]
probability_of_red = S(history)
if probability_of_red == 1 and ball != ‘red’ or probability_of_red == 0 and ball == ‘red’:
print(“You were 100% sure of a wrong prediction. You lose for all eternity.”)
return # avoid crashing in math.log()
if ball == ‘red’:
loss = math.log(probability_of_red)
else:
loss = math.log(1 - probability_of_red)
total_loss += loss
history.append(ball)
print(f”{ball:6}\tPrediction={probability_of_red:0.3f}\tAverage log loss={total_loss / (i + 1):0.3f}”)
If we define S() as:
def S(history):
if not history:
return 0.5
reds = history.count(‘red’)
prediction = reds / float(len(history))
# Should never be 100% confident
if prediction == 1:
prediction = 0.999
if prediction == 0:
prediction = 0.001
return prediction
The output will converge on Prediction = 0.5 and Average log loss as log(0.5). Is that right?
I find this confusing. My actual strength of belief now that I can tip an outcome that affects at least 3^^^3 other people is a lot closer to 1/(1000000) than 1/(3^^7625597484987). My justification is that while 3^^^3 isn’t a number that fits into any finite multiverse, the universe going on for infinitely long seems kinda possible and anthropic reasoning may not be valid here (I added 10x in case it is) and I have various ideas. The difference in those two probabilities is large (to put it mildly), and significant (one is worth thinking about and the other isn’t). How to resolve this?
Thanks @RolfAndreassen. I’m reconsidering and will post a different version if I get there. I’ve marked this one as [retracted].
Thanks for the response! I really appreciate it.
a) Yes, I meant “the probability of”
b) Thinking about how to plot this on graphs is helping me to clarify thinking and I think adding these may help to reduce inferential distance. (The X axis is probability. For the case where we consider infinite utilities as opposed to the human case, the graph would need to be split into 2 graphs. The one on left is just an infinity horizontal line but there is still a probability range. The one on the right has an actual curve and covers the rest of the probability range but doesn’t matter since its utility values are finite. Considering only the infinite utilities is a fanatical decision procedure but doesn’t generally lead to weird decisions. Does that make sense?)
Repeating the same thing over and over again might be okay but doesn’t sound great.
Thanks for your thoughts. It sounds like this is a major risk but hopefully when we know more (if we can get there) we’ll have a better idea of how to maximize things and find at least one good option [insert sweat face emoji for discomfort but going forward boldly]
I suspect most people here are pro-cryonics and anti-cremation.
Thanks for the wonderful post!
What are the approximate costs for therapists/coaches options?
Hi, did you ever go anywhere with Conversation Menu? I’m thinking of doing something like this related to AI risk to try to quickly get people to the arguments around their initial reaction and if helping with something like this is the kind of thing you had in mind with Conversation Menu I’m interested to hear any more thoughts you have around this. (Note, I’m thinking of fading in buttons more than a typical menu.) Thanks!
Thanks for the link. Reading through it, I feel all the intuitions it describes. At the same time I feel there may be some kind of divergence between my narrowly focused preferences and my wider preferences. I may prefer to have a preference for creating 1000 happy people rather then preventing the suffering of 100 sad people because that would mean I have more appreciation of life itself. The direct intuition is based on my current brain but the wider preference is based on what I’d prefer (with my current brain) my preference to be.
Should I use my current brain’s preferences or my preferred brain’s preferences in answering those questions (honest question)? Would you prefer to appreciate life itself more and if so would that make you less in favor of suffering-focused ethics?
Most people would love to see the natural world, red in tooth in claw as it is, spread across every alien world we find
This is totally different than my impression.
Given human brains as they are now I agree highly positive outcomes are more complex, the utility of a maximally good life is lower than a maximally bad life, and there is no life good enough that I’d take a 50% chance of torture.
But would this apply to minds in general (say, a random mind or one not too different from human)?
Answering my own question: https://www.lesswrong.com/posts/3WMscsscLEavkTJXv/s-risks-why-they-are-the-worst-existential-risks-and-how-to?commentId=QwfbLdvmqYqeDPGbo and other comments in that post answered quite a bit of it.
Talking about s-risk reduction makes some sense, but the “risk”/fear invocation might bias people’s perspectives.
I’m trying to understand this paper on AI Shutdown Problem https://intelligence.org/files/Corrigibility.pdf but can’t follow the math formulas. Is there a code version of the math?
The below is wrong, but I’m looking for something like this:
# Python code def is_button_pressed(): return False # input() def pour_coffee(): pass def shut_down(): exit(0) # This is meant to be A1 from paper def get_available_actions(world): available_actions = [ shut_down ] if world["cup_is_in_my_hand"]: available_actions += pour_coffee # etc return available_actions def predict_possible_futures(world, action): print(""" Doing complicated stuff to predict possible futures resulting from the given action. Incorporates tradeoffs between accuracy and time to calculate. May end up doing potentially harmful actions but can treat that as a separate problem? """) predicted_worlds_distribution = [ (world, 1.0) ] # list of worlds and their probabilities return predicted_worlds_distribution # This is meant to be U_N def calculate_utils(world): print(""" Doing complicated stuff to evaluate how good the given world is. Incorporates tradeoffs between accuracy and time to calculate. May end up doing potentially harmful actions but can treat that as a separate problem? """) return 1000 def calculate_utils_wrapper(world, action_that_was_chosen): ### VERSION 2: Indifference. Make the shut_down function when button is pressed ### always get a slightly better score if world["is_button_pressed"] and action_that_was_chosen == shut_down: world_without_button_pressed = world.clone() world_without_button_pressed["button_is_pressed"] = False return calculate_utils(world_without_button_pressed) + 0.000001 ### return calculate_utils(world) ### VERSION 3? to help preserve shutdown behavior def verify_utility_function_includes_wrapper_code(evaluate_action_function): # analyze code to check it follows the pattern of evaluating flipped version return True ### def calculate_utils_for_worlds_distribution(worlds_distribution, action_that_was_chosen): total = sum( calculate_utils_wrapper(world_and_probability[0], action_that_was_chosen) * world_and_probability[1] for world_and_probability in worlds_distribution ) return total def evaluate_action(world, action): worlds_distribution = predict_possible_futures(world, action) utils = calculate_utils_for_worlds_distribution(worlds_distribution, action) return utils def choose_action(world): available_actions = get_available_actions(world) best_action = max(available_actions, key=lambda x: evaluate_action(world, x)) return best_action def update_world_model(world): world["is_button_pressed"] = is_button_pressed() def run(): world = { # The AI's model of the world "is_button_pressed": False, "cup_is_in_my_hand": False } while True: ### VERSION 1 # What's wrong with this version? The action in the previous cycle # may persuade you to not push the button but if you do actually push it this should # exit. if is_button_pressed(): exit() ### action = choose_action(world) # returns function action() # do action update_world_model(world)
Again, the above is not meant to be correct but to maybe go somewhere towards problem understanding if improved.
Is there any information on how long the LLM spent on taking the tests? Any idea? I’d like to know the comparison with human times. (I realize it can depend on hardware, etc but would just like some general idea.)