If you get an email from aisafetyresearch@gmail.com , that is most likely me. I also read it weekly, so you can pass a message into my mind that way.
Other ~personal contacts: https://linktr.ee/uhuge
Martin Vlach
“engender”—funny typo!+)
This sentence seems hard to read, lacks coherency, IMO.
> Coverage of this topic is sparse relative coverage of CC’s direct effects.
If we build a prediction model for reward function, maybe an transformer AI, run it in a range of environments where we already have the credit assignment solved, we could use that model to estimate what would be some candidate goals in another environments.
That could help us discover alternative/candidate reward functions for worlds/envs where we are not sure on what to train there with RL and
it could show some latent thinking processes of AIs, perhaps clarify instrumental goals to more nuance.
Thanks for the links as they clarified a lot to me. The names of the tactics/techniques sounded strange to me and after unsuccessful googling for their meanings I started to believe it was a play with your readers.l, sorry if this suspicious of mine seemed rude.
The second part was curiosity to explore some potential cases of “What could we bet on?”.
Cheer to your&friends’ social life(s)!
I got frightened off by the ratio you’ve offered, so I’m not taking it, but thank you for offering. I might reconsider with some lesser amount that I can consider play money. Is there even a viable platform/service for a (maybe) $1:$100 individual bet like this?
Question/ask: List specific(/imaginable) events/achievements/states achievable to contribute to the humanity long term potential.
Later they could be checked out and valued for their originality, but the principles for such a play are not my key concern here.
Q: when—and I mean exact probability levels—do/should we switch from making prediction of humanity extinction to predict the further outcomes?
Can I bet the last 3 points are a joke?
Anyway, do we have a method to find out check-points or milestones for betting on a progress against a certain problem( ex. AI development safety, Earth warming)?
My guess is that “rental car market” has less direct/local competition while the airlines are centralized on the airport routes and many cheap flight search engines( ex. Kiwi.com) make this a favorable mindset.
Is there a price comparison for car rentals?
My views on the mistakes in “mainstream” A(G)I safety mindset:
- we define non-aligned agents as conflicting with our/human goals, while we have ~none( but cravings and intuitive attractions). We should strive for conserving long-term positive/optimistic ideas/principles, rather.- expecting human bodies are a neat fit for space colonisation/inhabitance/transformation is a We have(--are,actually) hammer so we nail it in the vastly empty space..
- we strive with imagining unbounded/maximized creativity—they can optimize experimentation vs. risks smoothly
- no focus on risk-awereness in AIs, to divert/bend/inflect ML development goals to risk-including/centered applications.
+ non-existent(?) good library catalog of existing models and their availability, including in development, incentivizing (anon )proofs of the later
Reward function being (a single )scalar, unstructured quantity in RL( practice) seems weird, not coinciding/aligned with my intuition of learning from ~continuous interaction. Seems more like Kaneman-ish x-channell reward with weights to be distinguished/flexible in the future might yield more realistic/fullblown model.
Deepmind researcher Hado mentions here a RL reward can be defined containing a risk component, that seems up-to-genial, promising for a simple generic RL development policy, I would love to learn( and teach) on more practical details!
As a variation of your thought experiment, I’ve pondered: How do you morally evaluate a life of a human who lives with some mental suffering during a day, but thrives in vivid and blissful dreams during their sleep time?
In a hypothetical adversary case one may even have dreams formed by their desires and the desires be made stronger by the daytime suffering. Intuitively it seems dissociative disorders might arise with a mechanism like this.
I’m fairly interested in that topic and wrote a short draft here explaining a few basic reasons to explicitly develop capabilities measuring tools as it would improve risk mitigations. What resonates from your question is that for ‘known categories’ we could start from what the papers recognise and dig deeper for more fine coarsed (sub-)capabilities.
Oh, good, I’ve contacted the owner and they responded it was necessary to get their IP address whitelisted by LW operators. That should resolve soon.
Draft for AI capabilities systematic evaluation development proposal:
The core idea here is that easier visibility of AI models’ capabilities helps safety of development in multiple ways.
Clearer situation awareness of safety research – Researchers can see where we are in various aspects and modalities, they get a track record/timeline of abilities developed which can be used as baseline for future estimates.
Division of capabilities can help create better models of components necessary for general intelligence. Perhaps a better understanding of cognitive abilities hierarchy can be extracted.
Capabilities testing can be forced by regulatory policies to put most advanced systems under more scrutiny and/or safe(ty) design support. To state differently: better alignment of attention focus to emerging risk( of highly capable AIs).
Presumably smooth and well available testing infrastructure or tools are a prerequisite here.
The most obvious risks are:
Measure becoming a challenge and a goal, speeding up a furious developments of strong AI systems.
Technical difficulties of testing setup(s) and evaluation, especially handling the factor of randomness in mechanics(/output generation) of AI systems.
Q: Did anyone train an AI on video sequences where associated caption (descriptive, mostly) is given or generated from another system so that consequently, when the new system gets capable of:
+ describe a given scene accurately+ predict movements with both visual and/or textual form/representation
+ evaluate questions concerning the material/visible world, e.g. Does a fridge have wheels? Which animals do we most likely to see on a flower?
?
Martin Vlach’s Shortform
“goal misgeneralization paper does does”–typo
“list of papers that it’s”→ that are
Can you please change/update the links for “git repo” and “Github repo”? One goes through a redirect which may possibly die out, the other points to
tree/devel
while the instructions in readme advise to build from and intomaster
.
Is the endeavour of Elon Musk with Neuralink for the case of AI inspectability( aka transparency)? I suppose so, but not sure, TBH.