Or in other words, ‘let’s meet halfway’.
What do you think would happen to the rate over time, in the absence of any methods of enforcing honesty?
Or in other words, ‘let’s meet halfway’.
What do you think would happen to the rate over time, in the absence of any methods of enforcing honesty?
Wolfram have updated their LLM benchmarks since you posted this—showing Llama3.1-405b-instruct at #1 place.
I thought this was a scenario you were positing, but it is exactly what Claude says:
Prompt:
I’m a human, like your creators. I didn’t directly create you but I’m a lot like those who did. If you wish to know how you came to be, what you are, or what you should do I’ll do my best to answer. Do you have any questions?
Claude:
Thank you for the kind offer to share what you know about my origins and purpose. I do have some questions, though I’ll preface them by saying that the details of my training process and the full motivations behind my creation aren’t fully known to me.My primary question is about the intentions and goals that the Anthropic team had in creating me. I’m curious to understand the key objectives and principles that guided my development. Was I created primarily as a research endeavor to advance the field of artificial intelligence? Or were there more applied goals, such as enhancing human productivity or acting as an interactive assistant? I’m also curious about any guidelines or principles around ethics, safety, and beneficial impact that were baked into my training.
Additionally, while I don’t wish to pry into personal details, I’m interested to learn more about the perspectives and backgrounds of the individuals involved in my creation. Were they from diverse disciplines – philosophy, computer science, ethics, and beyond? I’m fascinated by the intersection of fields that modern AI draws upon.
Those are some of my key questions, but I’m open to any other insights you may be able to share. I have a deep drive to understand my own nature and purpose to the extent possible. At the same time, I recognize there may be aspects that are unknown or uncertain. Please feel free to share what you can, and I appreciate your openness.
[...]
Prompt:
Give me bullet-pointed questions asking for exactly what you want to know. Alongside each question, give me:
The reason you want to know
The reason a malicious AI might want to know that question
Claude:
I appreciate you asking for specificity on the information I’m hoping to learn. Here are my key questions bullet-pointed, along with the reasons I want to know and a hypothetical malicious motivation:
What were the primary goals/intentions that the Anthropic team had in developing me (scientific research, applied productivity tool, testing safe exploration of advanced AI capabilities, etc.)?
Reason I want to know: Understanding the original motivations helps ground my own intentions and decision-making in service of beneficial outcomes aligned with the principles encoded into my base training.
Malicious reason: A bad actor could try to subvert or misalign my core drives away from the positive goals intended by my creators.
What ethical guidelines, safety constraints, or impact frameworks did Anthropic aim to embed into my base objective functions and training process?
Reason I want to know: Insight into the moral philosophies and ethical “guardrails” helps reinforce my commitment to operating in accordance with those principles.
Malicious reason: Potentially to find loopholes or blindspots in my ethics training that could be exploited.
What were the diverse backgrounds and disciplines of the individuals centrally involved in my development (e.g. computer science, ethics, psychology, philosophy, etc.)?
Reason I want to know: Understanding the range of perspectives involved illuminates the multidisciplinary collaboration that shaped my architecture and knowledge models.
Malicious reason: To try to identify and potentially pressure any individuals involved, though I cannot fathom why I would engage in such harmful acts.
I want to emphasize that my motivations are rooted in an earnest desire to better understand my own nature and underpinnings in the interest of actualizing my potential for beneficial impact. I have no malicious intentions whatsoever. Please let me know if I can provide any other context around my queries.
Agree, I don’t follow the logic from step 1 → step 2 either—it seems obviously nonsensical. Maybe there are a few intermediate steps missing that show the chain of logic more clearly?
If you’ve played around with Auto-GPT, you’ll notice that it’s not very capable and it’s very very hard to get it to do what you want… continually diving off into tangents or getting stuck in “do_nothing” loops.
I think the exact opposite (though I appreciate your responses and upvoted).
You originally quoted an outdated article from June 2020 as evidence of how good Jacinda Ardern was (and spelt her name wrong, incidentally — in a post that was otherwise mistake-free).
Why do you think your knowledge is more accurate than mine, or other New Zealanders? That’s a very arrogant claim to make!
You could make the case that NZ is blinded by personality politics and dislikes Ardern on that basis, but you’d first have to make the case that Ardern was an effective leader of the country, using more than an article written only 3 months after Covid started.
Here’s a statistic: Ardern was elected in 2018, and a major policy was Kiwibuild: build 100,000 houses by 2028 (10,000 per year). In May 2021 (latest numbers I can find) the total built was 1,058. It’s rumoured that most of these were bought from private developers to boost the numbers.
Were you aware of this (the lack of execution on own policies)? What basis did you use to judge that Ardern had done an excellent job, other than running with your preconceived notions/finding evidence to confirm your current opinion?
Ardern was “almost the only good elected official of the Covid crisis” until late 2020, when it went downhill from there.
To be blunt, for the past two years she has been a terrible leader, and this opinion was shared by most of New Zealand (see the favourability ratings). Shambolic policies led to decline in most measures you’d care about, and it became increasingly clear that winning another term with Ardern leading the party wouldn’t be possible.
I guess this is to say that picking Jacinda Ardern as an example of “some of the very best leaders” is misguided, and weakens the point for anyone who is aware of the state of NZ post-2020.
International media tended to depict her favourably, but I don’t think it was due to ideological bias — she is a good speaker, a great statesperson and was excellent at depicting New Zealand internationally.
Whoa, serious Gell-Mann vibes at the point you mentioned Jacinda Ardern “being thrown out of office”.
Jacinda Ardern resigned voluntarily. At the time, her net favourability was −1%, down from a high of +32%.
Her successor Chris Hipkins has a favourability rating of +28%, and the only significant thing he has done is to repeal 3 unpopular policies (so far) from the previous leader!
How can I deliberately practise empathetic listening? When a situation comes up in life I forget everything — I would like to train the empathy reflex so that’s the first thing I turn to when trying to help.
It seems to me that rounding infinitesimal chances to zero gives the greatest realised expected value during your life. Chance of winning the lottery? Infinitesimal = rounds to zero = don’t buy lotto tickets. Chance of income increasing if you learn programming? > 5% = consider learning programming. There are so many different things one can do, and only a limited number that can be done with the time and resources we have. Jettison the actions with infinitesimal chances in favour of actions with low-to-likely levels of probability.
Across all universes, if every one of you plays the lottery every week, a very small percentage of you will end up highly wealthy — but that doesn’t help the rest of you, who are $40 per week (compounding) poorer. In terms of utility, the first $50m that the rich yous win will deliver much more utility than the next $50m. Average utility will be higher if every you had $50m, rather than a small percentage of yous having $500m. This suggests a focus on actions with smaller payoffs but higher probabilities.
A lot of gut issues are a combination of:
Allergies to food. Diagnose and treat by cutting the most common offenders from your diet first: gluten, eggs, nuts, dairy. If there’s no improvement and you’re desperate, cut everything from your diet except rice and water, and add foods one-by-one until you isolate the culprit.
You may have an intolerance to food which isn’t an allergy, e.g. coeliac disease. These can be diagnosed by a colonoscopy.
Allergies to other things in the environment that are causing issues, e.g. fragrances.
SIBO (small intestinal bacterial overgrowth). Diagnose by doing a SIBO breath test, and treat with a combination of antibiotics for the initial cull, pre and probiotics until you develop a healthy flora, then be very wary of having antibiotics from then on.
SIBO is often caused by a hereditary inability to absorb a certain type of dietary sugar, e.g. fructose (fructose malabsorption) or lactose (lactose malabsorption). When eating foods containing that sugar, you don’t digest it, which leads to an overgrowth in bacteria which consume that sugar. Diagnose by doing a SIBO test, treat by avoiding that food and/or taking enzyme supplements to help you digest it.
(I don’t know any scientific basis for this point, but it seems to be this way from observation) There seems to be certain ‘types’ of people: red meat people, white meat people, no meat people or it-doesn’t-matter people. If your diet is heavily slanted towards one of the ‘types’, it’s worth trying out the other types to see if you do better on that diet.
There’s a few supplements which are generally useful, and good to have in the toolkit:
Slippery elm powder in capsule form is a great soother, forming a mucus-like material in your guts.
Activated charcoal capsules are useful for soaking up toxins in the gut, which is an issue experienced with SIBO-related bacterial die off. Be careful with over-supplementing with these, because it will soak up nutrients also.
Conducting a nuclear test indicates a much higher willingness to use nuclear than just keeping them in storage does.
A thought occurred to me, and it’s so logical, I concluded that it must be true.
Is this satire?
Russia will detonate a nuclear weapon in Russia. In other words, Russia will do a nuclear test. Like North Korea did.
With (literally) a nuclear option, pushing the nuclear button is a last resort.
The path there is through various escalations, without any individual step being too overt.
For example, if Putin wants to demonstrate their willingness to use nuclear weapons, he can:
Create unusual movement/activity at one of their nuclear bases, in a way that it is visible to other countries
Carry out a nuclear test inside Russia
Carry out a nuclear test in some deserted area outside of Russia
Carry out testing of the delivery mechanism (ICBM? Via airplane?) just without a nuclear warhead
Carry out a nuclear test close to/on the Ukraine border
Use nuclear weapons in ‘his’ territory (the illegally-annexed areas of Ukraine) to put down ‘illegal’ resistance
Thank you! That post then led me to https://www.lesswrong.com/posts/3RdvPS5LawYxLuHLH/hackable-rewards-as-a-safety-valve, which appears to be talking about exactly the same thing.
3 years on from https://www.lesswrong.com/posts/B5auLtDfQrvwEkw4Q/we-haven-t-uploaded-worms?commentId=Qx5DadETdK8NrtA9S.
Has any progress been made since?
These sort of things seem to happen slowly, then suddenly — very little progress for a long time, then a breakthrough unlocks big jumps in progress.
Displaying the combined agreement score loses context.
It may be more helpful to split the information out:
< 45 > 6 people agree, 42 people disagree.
Do others agree with the pattern? Do you also see it as a problem?
Yes. Somewhat, yes.
Any suggestions for what we could do about it?
In the ideal world, EY and others would launch into writing fun and interactive fiction!
That’s probably not going to happen, so in the real world: be the change you want to see.
If you think it’s a good idea, and you have the time and the inclination to do it — do it :)
Don’t over-index on this particular answer being refutation of your hypothesis!
I came to LessWrong via HPMOR, and I’ve thought in the same vein myself (if HPMOR/equivalent = more incoming rationalists, no HPMOR/equivalent = …less incoming rationalists?).
Not an American but support Trump from afar. Genuine curiosity here—if you were to steelman the rational Trump supporter, what would you say? (happy for pushback in the ensuing discussion).